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Fast  Parallel  Algorithms  For  Graphs  and  Networks 

Danny  Soroker 

abstract 

Many  theorems  in  graph  theory  give  simple  characterizations  for  testing  the 
existence  of  objects  with  certain  properties,  which  can  be  translated  into  fast  paral¬ 
lel  algorithms.  However,  transforming  these  tests  into  algorithms  for  constructing 
such  objects  is  often  a  real  challenge.  In  this  thesis  we  develop  fast  parallel  ("NC") 
algorithms  for  several  such  construction  problems. 

The  first  part  is  about  tournaments.  (A  tournament  is  a  digraph  in  which 
there  is  precisely  one  arc  between  every  two  vertices.)  Two  classical  results  state 
that  every  tournament  has  a  Hamiltonian  path  and  every  strongly  connected  tour¬ 
nament  has  a  Hamiltonian  cycle.  We  derive  efficient  parallel  algorithms  for 
finding  these  objects.  Our  algorithms  yield  new  proofs  of  these  theorems.  In  solv¬ 
ing  the  cycle  problem  we  also  solve  the  problem  of  finding  a  Hamiltonian  path 
with  one  fixed  endpoint.  Next  we  address  the  problem  of  constructing  a  tourna¬ 
ment  with  a  specified  degree-sequence,  and  give  an  NC  algorithm  for  it  which 

achieves  optimal  speedup. 

The  second  part  is  concerned  with  making  graphs  strongly  connected 
orientation  and  augmentation.  A  graph  is  strongly  orientable  if  its  edges  can  be 
assigned  orientations  to  yield  a  strongly  connected  digraph.  Robbins’  theorem 
states  the  conditions  for  a  graph  to  be  strongly  orientable.  His  theorem  was  gen¬ 
eralized  for  mixed  graphs,  i.e.  ones  that  have  both  directed  and  undirected  edges. 
We  give  a  fast  parallel  algorithm  for  strongly  orienting  mined  graphs.  We  then 
solve  the  problem  of  adding  a  minimum  number  of  arcs  to  a  mixed  graph  to  make 


it  strongly  orientable.  This  problem  was  not  previously  known  to  have  even  a 
polynomial  time  sequential  solution  (a  sequential  algorithm  was  discovered 
independently  by  Gusfield).  In  the  process  of  solving  the  general  problem  we 
derive  solutions  for  the  special  cases  of  undirected  and  directed  graphs. 

The  final  part  of  the  thesis  describes  a  methodology  which  yields  deterministic 
parallel  algorithms  for  several  supply-demand  problems  on  networks  with  zero-one 
capacities.  These  problems  include:  constructing  a  zero-one  matrix  with  specified 
row-  and  column-sums,  constructing  a  simple  digraph  with  specified  in-  and  out- 
degrees  and  constructing  a  zero-one  flow  pattern  in  a  complete  network,  where 
each  vertex  has  a  specified  net  supply  or  demand.  We  extend  our  results  to  the 
case  where  the  input  represents  upper  bounds  on  supplies  and  lower  bounds  on 

demands. 
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CHAPTER  ONE 
INTRODUCTION 


Parallel  computation  is  becoming  a  central  field  of  research  in  computer  sci¬ 
ence.  The  main  driving  forces  behind  this  have  been  technological  advances  in  the 
area  of  Very  Large  Scale  Integration.  The  price  of  computer  hardware  has  been 
pushed  down  to  the  point  where  time  of  execution  (rather  than  cost  of  components) 
is  the  main  limiting  factor  in  many  applications.  Furthermore,  the  speed  of  pro¬ 
cessor  and  memory  units  has  been  constantly  increasing,  but  is  gradually 
approaching  saturation  due  to  physical  limitations.  The  combination  of  these  fac¬ 
tors  has  brought  forth  the  necessity  for  parallel  computers  -  machines  which  have 
many  processors  that  cooperate  and  coordinate  actions  efficiently  to  solve  problems 
quickly. 

The  theory  of  parallel  computation  is  still  in  early  stages  of  development,  but 
is  gaining  popularity  due  to  its  practical  importance  on  one  hand  and  to  the  funda¬ 
mental  mathematical  problems  arising  in  it  on  the  other.  The  body  of  knowledge 
on  parallel  algorithms  is  much  smaller  than  that  on  sequential  algorithms.  One 
reason  is,  of  course,  time  -  researchers  have  been  constructing  sequential  algo¬ 
rithms  ever  since  the  idea  of  "computer"  was  born,  whereas  development  of  a 
coherent  theory  of  parallel  computation  started  only  in  the  late  1970s.  Thus,  the 
body  of  knowledge  on  sequential  complexity  of  problems  is  much  larger  than  that 
on  parallel  complexity. 

Another  main  difference  lies  in  deciding  on  an  appropriate  model  of  computa¬ 
tion.  In  the  sequential  world  there  is  a  clear  resemblance  between  the  standard 
Random  Access  Machine  model  ([AHU])  and  actual  working  computers  -  algo¬ 
rithms  developed  for  this  model  can  be  translated  quite  simply  into  programs  in 
existing  languages.  This  is  not  the  case  in  parallel  computing.  The  prevailing 
theoretical  model  for  algorithm  design  is  the  Parallel  Random  Access  Machine 
(PRAM,  to  be  formally  defined  in  chapter  two).  It  has  an  unbounded  number  of 
processors  working  in  full  synchrony,  each  performing  its  own  instruction  at  any 
given  step  and  each  having  unit-time  access  to  a  shared  memory.  It  is  set  apart 
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from  current  machines  in  several  aspects: 

(1)  MIMD  vs.  SIMD  -  In  the  PRAM  model  different  processors  perform  different 

operations  at  any  given  step.  This  property  is  known  as  MIMD  (Multiple 
Data  Multiple  Instruction).  However,  machines  with  the  largest  number  of 
processors  currently  available,  such  as  the  Connection  Machine  of  Thinking 
Machines  Corp.  which  has  as  many  as  65,536  processors  ([Hillis]),  are  SIMD 
(Single  Instruction  Multiple  Data).  This  means  that  at  each  time  step  all  pro¬ 
cessors  perform  the  same  operation.  Current  MIMD  machines,  such  as  the 
NYU  Ultracomputer  ([GGKMRS])  and  the  BBN  Butterfly  ([RT]),  have  sub¬ 
stantially  fewer  processors  (several  hundred). 

(2)  Shared  vs.  distributed  memory  -  Both  the  Ultracomputer  and  the  Butterfly 

resemble  the  PRAM  model  in  that  they  provide  shared  memory.  However 
access  to  the  shared  memory  takes  more  time  than  local  operations.  Further¬ 
more,  a  large  part  of  the  design  effort  has  gone  into  constructing  the  intercon¬ 
nection  network  between  processors  and  the  shared  memory,  and  scaling  these 
designs  to  accommodate  for  larger  numbers  of  processors  seems  to  be  a  very 
complex  and  expensive  task.  One  solution  to  this  problem  is  to  have  a  net¬ 
work  of  processors  with  no  shared  memory,  in  which  processors  communicate 
by  sending  messages.  One  such  design  is  the  Caltech  Cosmic  Cube  upon 
which  some  commercial  computers,  such  as  the  Intel  iPSC,  are  based 
([SASLMW]).  Several  strong  theoretical  results  ([Upfal],[KU],[Ranade])  show 
that  PRAMs  can  be  efficiently  simulated  by  bounded  degree  networks  (i.e. 
ones  in  which  the  degree  of  a  processor  (the  number  of  neighbors  it  has)  is 
fixed,  independent  of  the  total  number  of  processors).  These  results  give  par¬ 
tial  justification  for  using  the  PRAM  model  for  algorithm  design. 

(3)  Synchronous  vs.  asynchronous  -  The  PRAM  is  a  highly  synchronous  model. 
Each  step  of  the  computation  is  performed  synchronously  by  all  processors. 
This  is  impractical  under  current  technology  because  of  problems  in  achieving 
clock  synchronization.  However,  some  form  of  synchronization  is  important  to 
have  and  some  existing  machines  provide  tools  for  this.  For  example,  the 
NYU  Ultracomputer  has  a  "replace-add"  operation  built  into  its  hardware, 
which  can  be  used  for  synchronization. 
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Finally  we  note  that  in  some  of  the  currently  most  powerful  "supercomputers”, 
such  as  the  Cray-2  and  the  Goodyear  Aerospace  Corp.  MPP  ([SASLMW])  most  of 
the  parallelism  is  in  the  form  of  pipelining  and  vector  operations. 

To  summarize,  the  PRAM  model  of  computation  differs  substantially  from 
existing  parallel  computers.  Even  so,  it  is  an  appropriate  model  to  use  when 
studying  the  limits  and  possibilities  of  parallelism.  It  frees  the  algorithm  designer 
from  worrying  about  details  which  are  not  a  fundamental  part  of  the  problem 
being  studied.  Furthermore,  as  stated  above,  it  is  possible  to  automatically 
translate  PRAM  programs  to  ones  on  more  realistic  models  (bounded  degree  net¬ 
works)  with  a  relatively  small  loss  of  efficiency.  Finally,  the  PRAM  model  sets  a 
goal  for  future  designers  of  parallel  computers. 

This  thesis  is  a  study  in  parallel  algorithms.  The  research  reported  here 
involved  developing  parallel  algorithms  for  several  basic  problems  related  to 
graphs  and  networks.  Graph  theory  is  a  fundamental  area  underlying  computer 
algorithm  design,  since  graphs  are  general  structures,  which  provide  a  convenient 
means  for  modeling  many  real-world  problems.  Our  motivation  for  choosing  the 
problems  we  chose  was  not  necessarily  because  they  arise  in  specific  applications, 
but  rather  that  the  known  sequential  algorithms  for  them  seem  hard  to  parallelize. 

It  is  interesting  to  point  out  two  features  that  are  shared  by  most  of  the  prob¬ 
lems  we  considered,  since  they  often  come  up  in  the  study  of  parallel  algorithms. 

(1)  The  problem  has  a  simple,  even  trivial,  sequential  algorithm.  Transforming 

this  algorithm  into  a  fast  parallel  algorithm  turns  out  to  be  a  big  challenge, 
and  often  a  totally  new  approach  is  needed. 

(2)  The  problem  is  a  search  or  construction  problem  (e.g  "find  a  set  with  property 

X")  which  has  a  related  decision  problem  ("does  there  exist  a  set  with  pro¬ 
perty  X?").  The  decision  problem  has  an  known  fast  parallel  solution  whereas 
the  search  problem  does  not.  This  seemingly  inherent  difference  between 
decision  and  search  problems  does  not  come  up  usually  in  polynomial  time 
sequential  computation.  This  is  because  much  of  the  research  in  sequential 
algorithms  has  been  on  search  problems.  Furthermore,  in  many  cases  there  is 
an  obvious  means  of  converting  an  algorithm  for  a  decision  problem  to  an 
algorithm  for  the  related  search  problem  (using  self  reducibility).  However, 
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this  conversion  seems  inherently  sequential  and  does  not  yield  a  fast  parallel 
algorithm.  An  interesting  discussion  appears  in  [KUW1]. 

There  were  several  goals  motivating  the  research  reported  here.  First,  as 
mentioned  above,  the  problems  considered  posed  a  challenge  in  that  their  known 
sequential  solutions  seem  hard  to  parallelize.  Second,  in  the  process  of  trying  to 
create  parallel  algorithms  for  problems,  basic  techniques  for  parallel  computation 
may  be  developed.  By  "basic  techniques"  we  mean  procedures  that  have  poten¬ 
tially  wide  applicability  in  helping  to  solve  many  problems.  Examples  of  such 
techniques  in  the  literature  are  parallel  "tree  contraction"  of  Miller  and  Reif 
([MR]),  the  "Euler  Tour"  technique  for  trees  of  Tarjan  and  Vishkin  ([TV])  and 
"independence  systems"  of  Karp,  Upfal  and  Wigderson  ([KUW1]).  Examples  of 
techniques  developed  in  this  thesis  are:  introducing  the  notion  of  a  "dense  match¬ 
ing"  with  an  efficient  parallel  implementation  for  it  (chapter  four)  and  the  idea  of 
partitioning  the  edges  of  a  graph  into  "constellations"  (chapter  five). 

Another  possible  benefit  from  devising  parallel  algorithms  is  that  new  insight 
can  be  obtained  for  sequential  computation.  The  reason  for  this  is  that  trying  to 
find  a  parallel  solution  to  a  problem  seem  to  require,  in  many  cases,  directions 
which  are  very  different  than  the  common  sequential  ones.  On  the  other  hand,  a 
parallel  algorithm  (in  the  model  we  will  be  using)  can  easily  be  converted  into  a 
sequential  one.  Therefore,  a  new  efficient  parallel  algorithm  (where  efficiency  is 
measured  by  the  total  number  of  operations  performed  by  the  algorithm)  yields  a 
new  sequential  algorithm  with  good  running  time.  For  example,  motivations  from 
parallel  complexity  led  us  to  solve  the  minimum  strong  augmentation  problem  for 
mixed  graphs  (chapter  four),  for  which  no  previous  polynomial-time  sequential 
algorithm  was  published.  A  sequential  algorithm  was  discovered  independently  by 
Gusfield  ([Gusf]),  who  gives  several  applications  of  this  problem. 

The  process  of  developing  parallel  algorithms  can  be  broken  down  into  two 
main  stages,  analogous  to  the  sequential  case.  First  (at  least  in  the  problems  we 
considered)  one  needs  to  determine  if  the  problem  on  hand  has  a  fast  parallel  solu¬ 
tion.  Here  we  need  to  define  what  "fast  parallel"  means.  We  will  be  using  the  (by 
now  standard)  notion  of  NC  (to  be  formally  defined  in  chapter  two)  for  this  pur¬ 
pose.  This  first  stage  can  be  very  hard,  and  is  analogous  to  showing  that  a  prob- 


5 


lem  is  in  P.  It  is  possible  to  give  evidence  that  a  problem  is  unlikely  to  be  in  NC 
by  showing  it  to  be  :'P-complete"  (see  chapter  two),  analogously  to  marking  a  prob¬ 
lem  as  "probably  hard"  by  proving  it  to  be  NP -complete.  The  second  stage  is  to 
find  ways  to  improve  the  efficiency  of  the  algorithm  found  in  the  first  stage.  This 
is  analogous  to  developing  sophisticated  data  structures  to  push  down  the  running 
time  of  sequential  algorithms  as  close  as  possible  to  optimal.  In  this  context  we 
will  define  the  notions  of  "efficient"  and  "optimal '  parallel  algorithms. 

We  now  give  3  brief  outline  of  the  thesis.  Chapter  two  contains  a  formal 
description  of  the  concepts  which  are  the  foundations  of  the  research  reported  here. 
We  define  our  model  of  computation,  relevant  complexity  classes  and  efficiency  of 
algorithms.  We  also  mention  some  basic  graph  theoretic  terminology  and  notation 
we  will  be  using. 

Chapter  three  deals  with  tournaments,  which  are  digraphs  in  which  each  pair 
of  vertices  is  connected  by  one  arc.  The  first  half  of  this  chapter  has  algorithms  for 
constructing  Hamiltonian  paths  and  cycles  in  tournaments  ([Sorol]).  The  second 
half  describes  a  method  for  constructing  a  tournament  with  a  given  degree 
sequence  ([Soro3]).  This  problem  is  similar  to  the  problems  considered  in  chapter 
five,  but  the  techniques  for  solving  it  are  very  different. 

Chapter  four  describes  solutions  to  several  problems  related  to  edge  orienta¬ 
tion  ([Soro2]).  The  first  problem  is  to  orient  the  edges  of  a  mixed  graph  (i.e.  a 
graph  with  some  directed  and  some  undirected  edges)  so  that  the  resulting  digraph 
is  strongly  connected.  A  graph  for  which  such  an  orientation  is  possible  is  called 
strongly  orientable.  Next  we  solve  the  problem  of  augmenting  a  graph  (or 
digraph)  with  a  minimum  number  of  edges  to  make  it  strongly  orientable.  Finally 
we  extend  these  methods  to  derive  a  solution  for  the  minimum  strong  augmenta¬ 
tion  problem  for  mixed  graphs. 

Chapter  five  contains  a  description  of  a  methodology  which  yields  fast  parallel 
algorithms  for  several  construction  problems  that  can  be  stated  as  supply-demand 
problems  with  zero-one  arc  capacities  ([NS]).  The  problems  solved  are  constructing 
a  zero-one  matrix  with  specified  row-  and  column-sums,  constructing  a  digraph 
with  given  in-  and  out-degree  sequences  and  constructing  a  zero-one  flow  pattern 
between  a  set  of  sites,  each  of  which  has  a  specified  supply  or  demand.  The  algo- 
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rithms  are  extended  to  the  case  where  the  input  specifies  upper  bounds  on  supplies 
and  lower  bounds  on  demands. 
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CHAPTER  TWO 

FUNDAMENTAL  THEORETICAL  CONCEPTS 


In  this  chapter  we  give  a  formal  discussion  of  the  main  concepts  relevant  to  this 
thesis.  We  will  discuss  the  PRAM  model  of  computation,  the  complexity  classes 
NC  and  RNC  and  efficient  and  optimal  algorithms.  Finally,  we  will  mention  the 
basic  graph  theoretic  terminology  and  notation  we  will  be  using. 


2.1  The  PRAM  Model  of  Computation 

Many  models  of  parallel  computation  have  appeared  in  the  literature  (a  good 
survey  appears  in  [Cookl]).  In  this  manuscript  we  will  be  focusing  on  one  model: 
the  Parallel  Random  Access  Machine  (PRAM).  This  model  was  defined  by  various 
researchers  (see  e.g.  [Vishl]  for  a  survey),  and  is  currently  the  prevailing  model  for 
designing  parallel  algorithms.  A  PRAM  contains  a  sequence  of  processors, 
P i,  P2,  P3,...  and  a  shared  memory.  Each  processor  has  a  local  memory  and  the 
capabilities  of  the  standard  sequential  random-access  machine  ([AHU]):  it  can,  in 
one  step,  perform  a  basic  operation  such  as  adding  two  numbers,  comparing  two 
numbers,  reading  or  writing  to  its  local  memory  etc.  A  processor  can  also  access 
the  shared  memory  in  the  manner  described  below.  Every  processor  has  a  unique 
integer  index  distinguishing  it  from  all  other  processors. 

The  PRAM  is  synchronous  -  the  computation  is  performed  in  parallel  steps, 
where  at  step  i  each  processor  performs  its  ith  computation  step.  In  each  parallel 
step  of  computation  every  processor  performs  one  of  three  primitive  operations:  it 
can  either  read  one  cell  of  the  shared  memory,  write  into  one  cell  of  the  shared 
memory  or  perform  a  local  computation.  If,  in  one  step,  some  cell  is  both  read 
from  and  written  into,  we  will  assume  that  the  read  occurs  before  the  write. 
PRAMs  are  categorized  according  to  the  allowable  configurations  in  accessing  the 
shared  memory. 

Exclusive/Concurrent  Read:  in  an  exclusive  read  PRAM,  no  two  processors 

attempt  to  read  the  same  memory  cell  at  the  same  step.  An  algorithm  is  valid 
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for  this  model  only  if  it  has  this  property.  In  a  concurrent  read  machine  any 
number  of  processors  can  read  a  memory  cell  in  one  step. 
Exclusive/Concurrent  Write:  an  exclusive  write  PRAM  is  one  in  which  no  two 
processors  attempt  to  read  the  same  memory  cell  at  the  same  step.  In  a  con¬ 
current  write  model  many  processors  can  try  to  write  into  the  same  cell  simul¬ 
taneously  (i.e.  in  the  same  step).  Here  (as  opposed  to  concurrent  read),  it  is 
not  obvious  what  the  result  of  a  concurrent  write  should  be,  so  a  further 
categorization  is  necessary: 

COMMON:  all  processors  attempting  to  write  simultaneously  into  the  same 
cell  must  write  the  same  value. 

ARBITRARY:  one  of  the  processors  attempting  to  write  simultaneously  in 
one  cell  succeeds  (i.e.  its  value  gets  written).  The  algorithm  may  not 
make  any  assumptions  about  which  of  the  processors  succeeds. 

PRIORITY:  the  processor  with  the  lowest  index  of  those  attempting  to  write 
simultaneously  in  one  cell  succeeds. 

The  type  of  PRAM  is  specified  mnemonically,  for  example  CREW  means  con¬ 
current  read  -  exclusive  write.  It  is  clear  that  concurrent  read  (write)  is  at  least  as 
powerful  as  exclusive  read  (write),  since  any  algorithm  for  one  can  also  run  on  the 
other.  Similarly,  PRIORITY  is  the  most  powerful  concurrent  write  model  (of  the 
ones  listed  above)  and  COMMON  is  the  weakest. 

All  the  processors  have  the  same  program,  but  at  a  given  time  different  pro¬ 
cessors  will  be  performing  different  operations  since  the  program  can  refer  to  the 
processors’  identification  numbers.  The  requirement  of  one  program  for  all  proces¬ 
sors  makes  the  model  more  practical  and  also  disallows  unreasonable  power,  in 
that  it  enforces  a  ’uniformity’  condition  (i.e.  the  same  program  works  for  different 
input  sizes)  and  implies  that  the  program  is  finite.  For  example,  if  each  processor 
had  a  different  program,  there  would  exist  a  PRAM  program  that  solves  the  halt¬ 
ing  problem:  the  program  of  processor  i  would  be:  "  if  the  input  is  i  and  Turing 
machine  i  halts  then  output  ’halts’;  otherwise  output  ’doesn’t  halt’  ". 

A  PRAM  computation  works  as  follows:  initially  the  input  appears  in  the  first 
„  cells  of  shared  memory.  The  first  pin)  processors  are  simultaneously  "activated", 
and  run  a  common  program  as  described  above,  where  the  function,  pin),  depends 
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on  the  program.  We  will  refer  to  this  function  as  "the  number  of  processors,  used 
by  the  algorithm.  It  is  assumed  that  each  processor  knows  the  value  n.  At  the 
end  of  the  computation  the  output  should  appear  at  a  specified  location  (or  set  of 
locations)  in  the  shared  memory.  The  number  of  steps  until  the  last  active  proces¬ 
sor  finishes  running  the  program  is  the  running  time,  of  the  algorithm. 

There  is  an  obvious  time-processor  tradeoff  which  is  important  to  point  out. 
let  c>l  be  a  real  number  and  say  a  certain  PRAM  program,  A,  runs  in  time  t  and 

uses  p  processors.  Then  A  can  be  modified  to  use  only  \^}  processors,  and  run  m 

time  0{ct).  The  modification  is  simply  to  have  each  processor  in  the  modified  pro¬ 
gram  do  the  work  of  c  processors  in  A.  Therefore  each  step  of  A  corresponds  to  c 
consecutive  steps  in  the  modified  program.  As  a  special  case  of  this,  A  yields  a 
sequential  algorithm  running  in  time  p-t. 

A  note  about  word  size.  We  will  assume  that  a  memory  cell  (shared  or  local) 
is  of  size  clog2n  (for  some  constant,  c,  depending  on  the  algorithm).  Consequently, 
basic  operations  (addition,  comparison)  on  integers  up  to  nc  can  be  performed  in 
one  step.  One  reason  for  making  this  assumption  is  that  we  will  be  dealing  with 
graphs,  and  usually  the  primitive  elements  will  be  names  of  vertices  and  edges. 

Randomization:  As  in  sequential  computation,  the  notion  of  randomized  algo¬ 
rithms  helps  in  solving  certain  problems.  To  this  end  we  define  a 
probabilistic  PRAM.  In  this  model  each  processor  has  the  additional  capability  of 
generating  random  numbers.  More  specifically,  we  will  assume  that  in  one  step  a 
processor  can  generate  a  random  word,  i.e.  an  integer  chosen  uniformly  m  the 
range  [0,nc]-  Again,  a  probabilistic  PRAM  can  be  either  exclusive  or  concurrent 

read  and  write. 


2.2  Complexity  Classes 

In  our  study  of  parallel  algorithms  we  need  to  define  appropriate  complexity 
classes  to  express  the  notion  of  a  "fast  parallel  algorithm".  The  class  NC  is  com¬ 
monly  used  for  this  purpose.  The  class  NC  was  first  identified  and  characterized 
by  Pippenger  ([Pipp]).  The  name  "NC"  was  coined  by  Cook  (see  e.g.  [Cookl])  and 
stands  for  "Nick’s  Class",  giving  credit  to  Nick  Pippenger. 
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Definition:  A  decision  problem  is  in  NC  if  there  exists  a  PRAM  algorithm  solving 
it  that  runs  in  time  0(logCln)  using  0(n‘2)  processors,  where  n  is  the  size  of  the 
input  and  Cj.  and  c2  are  constants  independent  of  n. 


First  we  note  that  NC  is  defined  for  decision  problems  (i.e.  problems  for  which 
the  output  is  one  bit).  We  will  use  the  term  more  loosely  and  talk  about  an 
NC  algorithm ,  i.e,  a  PRAM  algorithm  that  obeys  the  appropriate  time  and  proces¬ 
sor  constraints. 

The  next  thing  to  note  is  that  the  type  of  PRAM  is  not  specified  m  the 
definition.  The  reason  is  that  the  strongest  PRAM  model  (PRIORITY  CRCW)  can 
be  efficiently  simulated  by  the  weakest  model  (EREW).  More  precisely,  for  any  p 
and  t,  for  any  PRIORITY  CRCW  algorithm  that  runs  in  time  t  and  uses  p  proces¬ 
sors,  there  exists  an  equivalent  EREW  algorithm  that  runs  in  time  O(t  logp)  and 
uses  p  processors.  (Equivalent  means  having  the  same  input-output  correspon¬ 
dence.)  The  simulation  (due  to  Vishkin)  involves  sorting  all  the  accesses  to  the 
shared  memory,  thus  detecting  the  highest-priority  processor  writing  into  each  cell 
and  providing  for  concurrent  writes.  Since  sorting  can  be  done  in  time  O(logn) 
using  a  linear  number  of  processors  on  an  EREW  PRAM  ([Cole]),  the  simulation 
achieves  the  complexity  stated  above.  It  is  standard  to  make  a  finer  classification 

within  NC: 

Definition:  A  decision  problem  is  in  AC‘  Ik  2 1)  if  there  exists  a  PRIORITY  CRCW 
PRAM  algorithm  solving  it  that  runs  in  time  0(log*n)  using  O(n')  processors, 
where  n  is  the  size  of  the  input  and  c  is  a  constant  independent  of  n.  A  problem  is 
in  AC  if  it  is  in  AC*,  for  some  k  Si. 


It  is  clear  by  the  statements  above  that  AC=NC.  We  point  out  that  in  the 
literature  NC  is  commonly  defined  in  terms  of  another  computational  model,  uni¬ 
form  circuit  families.  When  defining  NC  using  this  model,  one  gets  a  finer 
classification  within  NC  into  classes  NC"  (along  the  lines  of  the  classes  AC 
within  AC).  We  will  not  elaborate  on  this  here,  since  the  definitions  given  above 
are  sufficient  for  our  purposes.  For  more  details  regarding  other  models  and 
classes  related  to  NC  see  [Cookl]  and  [Cook2]. 
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It  is  clear  by  the  definition  (and  by  a  previous  remark  about  time-processor 
tradeoff)  that  NC  is  contained  in  P,  the  class  of  problems  solvable  in  (sequential) 
polynomial  time.  It  is  generally  believed  to  be  strictly  contained,  but  researchers 
are  currently  very  far  from  proving  general  lower  bounds  that  would  imply  this 
separation.  However,  because  of  this  belief,  there  is  a  way  of  "giving  strong  evi¬ 
dence"  that  a  problem  in  P  is  not  in  NC,  by  showing  that  its  membership  in  NC 
would  imply  P  =  NC.  Such  a  problem  is  log  — space  complete  for  P  or  simply 
P  —  complete . 

A  log  —  space  reduction  is  a  transformation  between  problems  computable  by  a 
Turing  machine  with  logarithmic  work  space  (details  in  [GJ]).  A  problem,  X,  is  P- 
complete  if  XCP  and  every  problem  in  P  is  log-space  reducible  to  X.  As  in  the 
theory  of  NP-completeness,  one  shows  that  a  problem  is  P-complete  by  demonstrat¬ 
ing  a  (log-space)  reduction  from  a  problem  that  is  known  to  be  P-complete  to  it. 
Quite  a  few  problems  have  been  shown  to  be  P-complete.  The  circuit  value  prob¬ 
lem  (given  the  description  of  a  boolean  circuit  and  values  for  its  inputs,  what  is 
the  value  of  the  output?)  was  shown  to  be  P-complete  by  a  "generic  reduction"  (a 
reduction  from  any  problem  in  P  given  a  Turing  machine  solving  it  in  time 
bounded  by  some  polynomial  in  the  input  length),  and  plays  a  similar  role  to  SAT 
in  the  theory  of  NP-completeness  ([Ladner}).  Other  interesting  examples  of  P- 
complete  problems  are  max  flow  ([GSS])  and  finding  the  lexicographically  first 
maximal  clique  ([Cook2]). 

How  is  a  log-space  reduction  relevant  for  parallel  computation?  It  turns  out 
that  there  is  a  close  relationship  between  sequential  space  and  parallel  time 
known  as  the  "parallel  computation  thesis"  (see  e.g.  [FW]).  A  consequence  of  this 
is  that  a  log-space  reduction  can  be  computed  in  NC.  Thus  if  a  P-complete  prob¬ 
lem  is  in  NC  then  every  problem  in  P  is  in  NC.  It  is,  therefore,  unlikely  that  a  P- 
complete  problem  is  in  NC . 

The  probabilistic  counterpart  of  NC  is  Random  NC  ( RNC ). 

Definition:  A  decision  problem  is  in  RNC  if  there  is  a  probabilistic  PRAM  algo¬ 
rithm  that  runs  in  polylog  time  using  a  polynomial  number  of  processors  which,  on 

g 

any  input,  gives  the  correct  answer  with  probability  at  least  —  ([Cook2]). 
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The  definition  can  be  extended  to  problems  with  many  output  bits  by  saving 

3 

that  the  algorithm  gives  a  correct  output  sequence  with  probability  at  least  — .  An 

algorithm  of  the  type  appearing  in  the  definition  (i.e.  one  that  can  make  errors)  is 
known  as  a  Monte  Carlo  algorithm.  A  more  powerful  notion  is  a  probabilistic 
algorithm  that  makes  no  errors,  known  as  a  Las  Vegas .  algorithm.  In  this  case  the 
random  variable  is  the  running  time  of  the  algorithm,  whose  expected  value  is  (for 
our  purposes)  polylog  in  the  input  size. 


2.3  Efficient  and  Optimal  Algorithms 

We  have  identified  the  notion  of  problems  having  fast  parallel  algorithms 
with  the  class  NC.  However,  it  is  not  clear  that  an  NC  algorithm  would  be  practi¬ 
cal,  since  the  number  of  processors  in  an  actual  machine  does  not  change  as  a  func¬ 
tion  of  the  input  size.  For  example,  say  the  best  NC  algorithm  we  have  for  some 
problem  uses  n3  processors  and  the  same  problem  has  a  sequential  algorithm  that 
runs  in  time  n.  If  we  have  an  actual  machine  with,  say,  1000  processors  then  for 
instances  larger  than  31  the  sequential  algorithm  would  run  faster  using  one  pro¬ 
cessor  than  the  parallel  algorithm  using  all  1000  processors  (implementing  the 
processor-time  tradeoff  mentioned  earlier).  We,  therefore,  need  a  more  precise 
notion  of  what  constitutes  a  good  parallel  algorithm. 

Definition:  Let  Tin)  be  the  running  time  of  the  fastest  sequential  algorithm 
known  for  solving  problem  X,  and  let  A  be  a  PRAM  algorithm  solving  X  that  runs 
in  time  f(n)  and  uses  pin)  processors.  Then: 

(1)  A  is  efficient  if  p{n)-t(n)  =  OiTin)-\ogc  n)  (for  some  constant,  c). 

(2)  A  is  optimal  (or  achieves  optimal  speedup  )  if  pin)- tin)  —  OiTin)). 

The  first  thing  to  note  is  that  this  definition  is  quite  pragmatic  -  it  evaluates 
an  algorithm  according  to  the  current  state  of  knowledge  (i.e.  the  best  sequential 
algorithm  known),  which  is  not  necessarily  well-defined  theoretically,  but  makes 
sense  practically.  The  definition  is  theoretically  sound  for  problems  which  are 
known  to  have  matching  upper  and  lower  sequential  bounds. 
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Another  point  regarding  this  widely  used  definition  is  that  an  algorithm  that 
achieves  optimal  speedup  does  not  necessarily  have  a  high  degree  of  parallelism. 
For  example,  the  best  sequential  algorithm  is  optimal  by  the  above  definition. 
Therefore  a  more  meaningful  notion  for  our  purposes  is  an  efficient  ( optimal )  NC 
algorithm  (i.e.  an  NC  algorithm  that  is  efficient  or  optimal). 

Some  basic  problems  that  have  efficient  NC  algorithms  are:  computing  con¬ 
nected  components  of  an  undirected  graph  ([S V]),  computing  biconnected  com¬ 
ponents  ([TV]),  constructing  a  maximal  matching  in  a  graph  ([IS])  and  dynamic 
expression  evaluation  ([MR]).  Problems  for  which  optimal  NC  algorithms  are 
known  include  sorting  ([Cole])  and  prefix  computations  (e.g.  [Fich]). 


2.4  Terminology  and  Notation 

The  graph-theoretic  terminology  we  use  is  mostly  standard.  We  give  here 
some  of  the  main  definitions  (to  fix  terminology  rather  than  to  introduce  concepts). 
For  a  wider  discussion  see  [CL]  or  [Berge].  More  specialized  definitions  (e.g.  for 
tournaments  and  mixed  graphs)  will  be  given  in  the  appropriate  chapters. 

An  (undirected)  graph  G  =(V ,E)  consists  of  a  set  of  vertices,  V,  and  a  set  of 
edges,  E.  V{G)  and  E{G)  denote,  respectively,  the  vertex  set  and  edge  set  of  G. 
An  edge,  e  =  {u,vj,  is  an  unordered  pair  of  vertices  (u  and  v  are  the  endpoints  of 
e,  and  e  is  incident  to  u  and  u).  The  degree  of  a  vertex  is  the  number  of  edges 
incident  to  it.  Two  vertices,  u  and  u,  are  adjacent  (or  neighbors)  if  {u,v}€E.  A 
path  is  a  sequence  of  edges  {v^v?},  fvo.v^},  .  .  .  ,  /b*_i ,V)J  where  v^vj  for  all 
i*j.  If  Vi  —  Vjf  then  this  sequence  of  edges  is  a  cycle.  A  path  can  also  be 
expressed  by  the  sequence  of  vertices  along  it.  A  forest  is  a  graph  with  no  cycles. 
Vertices  x  and  y  are  in  the  same  connected  component  if  they  lie  on  some  path. 
G  is  connected  if  all  the  vertices  are  in  the  same  connected  component.  A  tree  is 
a  connected  graph  with  no  cycles.  A  bipartite  graph  is  a  graph  whose  vertex  set 
is  partitioned  into  two  sets,  X  and  Y ,  such  that  every  edge  is  incident  to  a  vertex 
in  X  and  a  vertex  in  Y . 

A  directed  graph  (or  digraph),  G=(V,E),  consists  of  a  set  of  vertices,  V, 
and  a  set  of  arcs,  E .  V(G )  and  E{G)  denote,  respectively,  the  vertex  set  and  arc  set 
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of  G.  An  arc,  a  =(u,v),  is  an  ordered  pair  of  vertices  (u  is  the  tail  of  a  and  v  is  its 
head).  The  in-degree  din(v)  (out-degree  dout{v))  of  u  is  the  number  of  arcs  whose 
head  (tail)  it  is.  A  (directed)  path  from  to  vk  is  a  sequence  of  arcs 
(v l,v2),(v2,v3),  .  .  .  ,  (vk-ltvk)  where  v^vj  for  all  i*j.  If  vx  =  vk  then  this 
sequence  is  a  (directed)  cycle.  Vertices  x  and  y  are  in  the  same  strongly  con¬ 
nected  component  if  there  is  a  path  from  x  to  y  and  from  y  to  x.  G  is  strongly 
connected  if  all  the  vertices  are  in  the  same  strongly  connected  component. 

The  following  definitions  are  appropriate  for  both  graphs  and  digraphs  (with 
"edge"  representing  both  edges  and  arcs).  The  number  of  vertices  and  edges  will 
usually  be  denoted  by  n  and  m  respectively.  A  subgraph  of  G  is  a  graph,  H, 
whose  vertex  set  is  a  a  subset  of  V(G)  and  edge  set  is  a  subset  of  E(G).  H  is  a 
spanning  subgraph  of  G  if  V(H)  —  V(G).  A  Hamiltonian  path  (cycle)  is  a  span¬ 
ning  path  (cycle).  For  a  set  of  vertices,  U,  the  induced  subgraph  on  U  (denoted 
by  G(U))  is  the  graph  on  the  vertex  set  U  whose  edge  set  is 
{{x,y}  |  x,y€G  and  {x,y}ZE{G)}. 
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CHAPTER  THREE 

EFFICIENT  ALGORITHMS  FOR  TOURNAMENTS 


3.1  Introduction 

A  tournament  is  a  directed  graph  T  ={V ,E)  in  which  any  pair  of  vertices  is 
connected  by  exactly  one  arc.  This  models  a  competition  involving  n  players, 
where  every  player  competes  against  every  other  one.  A  trivial  but  useful  fact  is 
that  any  induced  subgraph  of  a  tournament  is  also  a  tournament.  If  iu,v)€E  we 
will  say  that  u  dominates  v  ,  and  denote  this  property  by  u>u.  Note  that  since 
the  directions  of  the  arcs  are  arbitrary,  the  domination  relation  is  not  necessarily 
transitive.  We  extend  the  notion  of  domination  to  sets  of  vertices:  let  A,B  be  sub¬ 
sets  of  V.  A  dominates  B  (A  >B)  if  every  vertex  in  A  dominates  every  vertex  in 
B. 

For  a  given  vertex,  v,  we  categorize  the  rest  of  the  vertices  according  to  their 
relation  with  v  :  W(v)  is  the  set  of  vertices  that  are  dominated  by  u  (i.e.  vertices 
involved  in  matches  which  u  Won)  and  Hu)  is  the  set  of  vertices  that  dominate  v 
(matches  which  u  Lost). 

The  transitive  tournament  on  n  vertices  is  the  tournament  in  which  each 
integer  between  1  and  n  has  a  corresponding  vertex,  and  i  dominates  j  if  i>j. 
The  score  of  a  vertex  is  the  number  of  vertices  it  dominates.  The  score  list  of  a 
tournament  is  the  sorted  list  of  scores  of  its  vertices  (starting  with  the  lowest). 

Tournaments  have  been  extensively  studied  (e.g.  [BW],  [Moon]).  In  this 
chapter  we  will  deal  with  several  classical  results.  The  first  set  of  results  talk 
about  existence  of  Hamiltonian  paths  and  cycles  in  tournaments:  every  tournament 
has  a  Hamiltonian  path,  and  every  strongly  connected  tournament  has  a  Hamil¬ 
tonian  cycle.  These  theorems  are  in  contrast  with  the  fact  that  deciding  if  an  arbi¬ 
trary  graph  is  Hamiltonian  is  NP-complete  [GJ].  The  proofs  of  these  theorems  in 
the  literature  imply  efficient  algorithms  for  finding  these  objects,  but  since  the 
proofs  are  by  induction,  the  algorithms  seem  inherently  sequential.  A  natural 
question  is  -  can  Hamiltonian  paths  and  cycles  in  tournaments  be  found  quickly  in 
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parallel?  We  answer  in  the  affirmative  by  giving  NC  algorithms  for  both  prob¬ 
lems.  In  the  process  of  giving  the  algorithms  we  demonstrate  new  proofs  of  the 

theorems. 

At  first  we  show  how  to  find  a  Hamiltonian  path.  A  similar  algorithm  was 
discovered  independently  by  J.  Naor  ([NaorJ).  Our  solution  uses  an  interesting 
technical  lemma,  which  states  that  in  every  tournament  there  is  a  "mediocre" 
player  -  one  that  has  both  lost  and  won  many  matches.  We  also  give  a  very  simple 
and  efficient  randomized  algorithm,  which  raises  some  interesting  issues. 

We  then  move  to  the  Hamiltonian  cycle  problem,  which  turns  out  to  be  quite 
a  bit  more  complicated.  The  main  idea  in  the  solution  is  defining  a  new  problem  - 
that  of  finding  a  Hamiltonian  path  with  one  fixed  endpoint  -  and  solving  it  simul¬ 
taneously  with  finding  a  Hamiltonian  cycle,  using  a  "cut  and  paste"  technique. 

The  other  main  part  of  this  chapter  deals  with  the  construction  problem: 
given  a  non-decreasing  list  of  integers,  T=sl,  .  .  .  determine  if  there  exists  a 
tournament  with  score  list  ?,  and  if  so,  construct  such  a  tournament.  A  simple, 
non-constructive  criterion  for  testing  if  such  a  list  is  a  score  list  was  found  by  Lan¬ 
dau  in  1953  ([BW]):  5*  is  a  score  list  of  some  tournament  if  and  only  if,  for  all  k, 

1  <k  <n: 


with  equality  for  k—n. 

A  simple  greedy  algorithm  ([BW.CL])  is  known  for  constructing  a  tournament 
with  Vi  having  score  s,  (for  all  1  *i*n):  select  some  score  s,  and  remove  it  from 
the  list;  have  vt  dominate  the  s,  vertices  with  smallest  scores  (and  have  the  rest  of 
the  vertices  dominate  Wi);  subtract  1  from  the  score  of  each  vertex  dominating  vt 
and  repeat  this  procedure  for  the  reduced  list.  We  note  that  very  similar  algo¬ 
rithms  exist  for  several  other  construction  problems,  some  of  which  are  discussed 
in  chapter  5  of  this  thesis  ([Berge],[CLJ,[FF]). 

Checking  the  set  of  conditions  described  above  is  easy  to  do  efficiently  in 
parallel,  but  implementing  the  construction  algorithm  seems  hard.  We  give  an 
alternate  method,  which  yields  an  optimal  NC  algorithm  for  the  construction  prob- 

lem. 
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The  algorithms  presented  in  this  chapter  are  efficient,  some  optimal.  All  the 
deterministic  algorithms  use  0(n2/logn)  processors  on  a  CREW  PRAM,  where  n  is 
the  number  of  vertices.  They  are  efficient  since  the  size  of  the  tournament  is 
Q(n2).  The  algorithms  for  Hamiltonian  path  and  cycle  run  in  time  O(log'n).  The 
algorithm  for  the  tournament  construction  problem  runs  in  time  O(logn). 

The  randomized  algorithm  for  the  Hamiltonian  path  problem  runs  in  expected 
time  O(logn)  on  a  CRCW  PRAM  and  uses  only  O(n)  processors!  At  first  sight 
this  seems  "better  than  optimal",  but  it  is  observed  that  only  0(nlogn)  arcs  need  to 
be  inspected  in  order  to  find  a  Hamiltonian  path.  However,  for  all  the  other  prob¬ 
lems  considered  in  this  chapter,  Q(n2)  lower  bounds  hold:  in  finding  strongly  con¬ 
nected  components  and  Hamiltonian  cycles  Q(*2)  arcs  need  to  be  inspected  (as  can 
be  shown  by  a  simple  adversary  strategy);  in  the  tournament  construction  problem 
the  output  size  is  0(n2).  Thus  the  algorithms  described  for  these  problems  are, 
indeed,  efficient  or  optimal. 

We  start  with  a  discussion  of  the  structure  of  the  strongly  connected  com¬ 
ponents  of  a  tournament,  which  will  be  useful  in  later  sections.  This  structure  has 
some  nice  properties  which  give  rise  to  an  optimal  NC  algorithm  for  finding  the 
strongly  connected  components.  In  contrast,  the  most  efficient  parallel  algorithm 
known  for  determining  strongly  connected  components  in  general  digraphs  is  to 
compute  the  transitive  closure,  and  is  not  optimal.  This  is  discussed  in  more  detail 
in  the  introduction  to  chapter  4. 

3.2  Strongly  Connected  Components  of  a  Tournament 

An  important  computation  required  by  our  algorithms  is  finding  strongly  con¬ 
nected  components  in  a  tournament.  In  this  section  we  show  some  special  proper¬ 
ties  of  the  strong  component  structure  of  tournaments  and  give  a  simple  optimal 
NC  algorithm  for  finding  it  based  on  these  properties.  In  a  nutshell,  the  strong 
component  structure  of  a  tournament  depends  only  on  its  score  list. 

The  first  simple  lemma  shows  that  there  is  a  total  ordering  of  the  strongly 
connected  components. 

Lemma  3.1:  Let  T  be  a  tournament  and  let  Cl}  C2,  •  •  ■  >  Ck  be  its  strongly  con- 
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nected  components.  Then  for  all  i,j  either  C;>Cj  or  Cj>Ct  (recall  that  A  >B 
means  that  every  vertex  in  A  dominates  every  vertex  in  B ). 

Proof:  By  definition  of  strongly  connected  components  all  arcs  between  C,  and  C, 
go  in  the  same  direction.  Since  T  is  a  tournament  all  such  arcs  exist.  Q 

The  implication  of  this  lemma  is  that  in  order  to  describe  the  strong  com¬ 
ponent  structure  we  need  only  specify  the  partition  of  vertices  into  strongly  con¬ 
nected  components  and  the  total  order  of  the  components  (as  opposed  to  a  general 
digraph  for  which  there  is  only  a  partial  ordering  of  the  strongly  connected  com¬ 
ponents).  The  next  lemma  shows  that  this  partition  is  related  to  the  score  list. 

Lemma  3.2:  Let  u,u€V  and  say  s(u)2s( v)  (where  s(u)  is  the  score  of  vertex  u). 
Let  Su  and  Su  be  the  strongly  connected  components  containing  u  and  v  respec¬ 
tively.  Then  either  S-=SU  or  SU>SU. 

Proof:  If  u>v  the  claim  clearly  holds.  If  v>u,  then  by  the  pigeonhole  principle 
there  must  be  a  vertex,  w,  such  that  u>w  and  w>u.  Thus  the  claim  holds  in  this 

case  too.  D 


Let  V  =  {vlt  .  .  .  ,1'J ,  where  s(o1)Ss(u2)^  •  •  •  *s{vn).  Lemma  3  tells  us  that 
each  strongly  connected  component  is  of  the  form  Vj,Vj  +  i,  ■  .  ■  ,vk,  for  some  j,k. 
How  can  we  determine  where  a  strongly  connected  component  starts  and  where  it 
ends?  This  turns  out  to  be  simple:  If  vk  is  the  vertex  of  highest  score  in  a  strongly 


i 

connected  component,  then  u;<u,  for  all  i<k<j.  It  follows  that  2^(0;)  -  the 

number  of  arcs  whose  tail  is  in  the  set  {vu  .  .  .  .vj  -  is  equal  to  -  the  number  of 
arcs  in  the  tournament  induced  on  this  vertex  set.  The  converse  is  also  true:  if 


>  (2],  then  vh  and  y*  +  l  are  in  the  same  strongly  connected  component. 

Thus  the  strongly  connected  components  can  be  computed  by  the  following  algo¬ 
rithm: 


procedure  STRONG— COMPONENTS {T ) 

(1)  In  parallel  for  all  vertices,  v,  compute  the  score,  s(v),  of  v. 

(2)  Sort  the  sequence  of  scores  in  non-decreasing  order  to  obtain  the  score  list, 
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S  —  8X,  S2>  >  s„  of  T . 

(3)  Compute  the  partial  sums, 


for  all  UHn  in  parallel. 


(4)  Partition  the  vertices  into  strongly  connected  components  according  to  the 
zeroes  in  the  sequence  px,  P2>  •  ■  ■  •  Pn  '■  vertex  score  sk  is  the  last 

(i.e.  of  highest  score)  vertex  in  a  strongly  connected  component  if  and  only  if 

Pi=0- 


end  STRONG-COMPONENTS . 


It  is  not  hard  to  see  that  this  procedure  can  be  performed  using  0(n2/logn) 
processors  in  time  O(logn)  on  a  CREW  PRAM.  In  fact,  if  the  scores  are  given  then 
only  0{n)  processors  are  required  (using  Cole’s  sorting  algorithm  for  step  (2) 

[Cole]). 

To  summerize,  the  structure  of  the  strongly  connected  components  of  a  tourna¬ 
ment  depends  only  on  its  score  list.  This  may  seem  surprising,  since  Q(n2)  bits  are 
required  to  specify  a  tournament  on  n  vertices,  but  only  0(n  logn)  bits  are  needed 
to  specify  its  score  list. 


3.3  Hamiltonian  Paths  and  Cycles 
3.3.1  Hamiltonian  Path 

We  start  by  stating  the  theorem  due  to  Redei  [Redei]  and  its  textbook  proof 
([Rober]  page  487). 

Theorem  3.1:  Every  tournament  contains  a  Hamiltonian  path. 

Proof:  By  induction  on  the  number,  n,  of  vertices.  The  result  is  clear  for  n  =  2. 
Assume  it  holds  for  tournaments  on  n  vertices.  Consider  a  tournament,  T,  on  n  +  1 
vertices.  Let  v  be  an  arbitrary  vertex  of  V{T).  By  assumption  G{V  —  {vj)  has  a 

Hamiltonian  path  vlt  v2,  .  .  .  ,  vn.  If  then  u.Uj, - <•>„  is  a  Hamiltonian 

path  of  T.  If  v<vn  then  ux,  .  .  .  ,  vn,  v  is  a  Hamiltonian  path  of  T.  Otherwise 
there  must  exist  an  index,  i<n,  such  that  v{>v  and  vi  +  l<u.  In  this  case, 
Vi,  .  .  .  ,  v,  vi  +  i,  .  .  .  ,  vn  is  the  desired  Hamiltonian  path.  0 
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This  proof  yields  a  very  efficient  sequential  algorithm  for  finding  a  Hamil¬ 
tonian  path  in  a  tournament.  The  important  observation  is  that  adding  a  new  ver- 
tea  to  a  path  of  length  I  can  be  done  by  inspecting  only  Oflogf)  arcs  using  binary- 
search.  This  shows  that  only  O(nlogn)  arcs  need  to  be  inspected  in  the  worst  case 
in  order  to  find  a  Hamiltonian  path.  A  matching  lower  bound  can  be  obtained  by 
an  analogy  to  sorting:  when  the  given  tournament  is  transitive,  finding  a  Hamil¬ 
tonian  path  in  it  is  equivalent  to  sorting  n  integers,  since  inspecting  an  arc 
corresponds  to  a  comparison,  and  the  Hamiltonian  path  corresponds  to  the  sorted 
sequence.  Since  fl(nlogn)  comparisons  are  required  for  sorting  ([AHUD,  it  follows 
that  fl(nlogn)  arcs  need  to  be  inspected  to  determine  a  Hamiltonian  path  in  a  tour¬ 


nament. 

In  order  to  obtain  a  fast  parallel  algorithm  a  different  method  seems  to  be 
required.  The  approach  we  take  is  divide  and  conquer.  A  simple-minded  way  is 
the  following:  (i)  Split  the  tournament  into  two  subgraphs,  T„  T2,  of  roughly  equal 
order,  (ii)  In  parallel,  find  Hamiltonian  paths  H,  in  TL  and  tf2  m  T2.  (m)  Connect 
H1  and  H2  to  form  a  Hamiltonian  path  of  T. 

The  problem  with  this  approach  is  that  step  (iii)  is  not  guaranteed  to  succeed, 

since  we  have  no  control  over  what  the  endpoints  of  H,  and  H2  are. 

It  turns  out  that  a  slightly  modified  approach  does  work.  The  key  observation 

is  the  following:  let  o  be  a  vertex  of  T.  Consider  Hamiltonian  paths  I, . k  of 

Ho)  and  u>„  .  .  .  ,  wm  of  W(  o).  Since  lt>  o  and  u>u>,  we  can  obtain  the  following 

r  .  .  cT.  ,  L  v  w,  .  ,  w,.  Note  that  this  provides  an 

Hamiltonian  path  of  T.  l\,  -  •  •  ,  wu  1 

alternative,  simpler  proof  of  theorem  1. 

In  order  to  derive  an  NC  algorithm  from  the  above  idea,  we  need  the  follow- 
ing  technical  lemma: 


Lemma  3.3  (Mediocre  player  lemma):  In  a  tournament,  T,  on  n  vertices  there 
exists  a  vertex,  t>,  for  which  both  U v)  and  WM  have  at  least  [-jl  vertices. 

Proof:  We  point  out  that  this  lemma  is  a  corollary  of  lemma  3.4.  The  proof  given 
here  is  of  interest  because  it  shows  why  the  constant  ±-  comes  up  and  implies  what 


the  worst-case  tournaments  are.  Let 

I  =  {u\  din(u )2  d0Ut(u)} 
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0  =  V-I. 

Assume  w.l.o.g  that  |/|>|0|.  By  the  pigeonhole  principle  there  exists  a  vertex, 
v,  whose  out-degree  in  T{I)  is  no  less  that  its  in-degree  in  T{I).  Thus 

dout(v)  -  l-^-J  -  l-Jj 

and  din(v)  >  dout(v)  s  [  jj  by  definition.  [] 

Remark:  A  simple  construction  shows  that  for  every  n  there  are  tournaments  on  n 
vertices  for  which  each  vertex  has  either  in-degree  or  out-degree  [n/ 4J. 

Using  lemma  1  we  obtain  our  algorithm: 

procedure  PATH{T) 

(1)  Let  n  =  order  of  T . 

(2)  If  n  =  1  then  return  the  unique  vertex  of  T. 

(3)  Find  a  vertex,  u,  whose  in-degree  and  out-degree  in  T  are  both  at  least 
[n/4\. 

(4)  In  parallel  find  = PATH{L{v ))  and  H 2  —  PATH(W(v)). 

(5)  Return  the  path  {H x,  v,  H2). 
end  PATH. 

By  lemma  1,  step  (3)  can  be  achieved,  so  only  O(logn)  levels  of  recursive  calls 
(step  (4))  are  required.  The  time  required  for  one  level  is  O(logn)  on  a  CREW 
PRAM  (partitioning  the  vertices  and  updating  their  degrees).  Therefore  the  total 
running  time  is  OOog2/!).  The  number  of  processors  required  is  0(n2/logn)  (since 
the  degree  of  a  vertex  can  be  computed  in  O(logn)  time  with  0 (n /logn)  processors 
by  standard  methods). 

The  main  obstacle  in  making  this  algorithm  more  efficient  is  the  need  to  com¬ 
pute  the  in-  and  out-degrees  of  the  vertices  and  to  update  them  in  every  step.  In 
fact  it  can  be  shown  that  any  deterministic  algorithm  for  finding  a  mediocre'  ver¬ 
tex  in  a  tournament  needs  to  inspect  £2(/i2)  arcs  in  the  worst  case.  We  do  not 
know,  at  this  point,  if  there  is  a  deterministic  NC  algorithm  for  the  Hamiltonian 
path  problem  that  uses  only  O(n)  processors,  but  there  is  a  simple  randomized 
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scheme  based  on  the  observation  that  "most  vertices  are  mediocre".  This  is  for¬ 
mally  stated  in  the  following  lemma,  which  is  a  generalization  of  lemma  3.3: 

Lemma  3.4:  In  a  tournament,  T,  on  n  vertices  there  are  at  least  n- 4k +2  vertices 
for  which  |L(u)|sA  and  |W(t;)|>£  ,  for  all  l<k<\^\. 

Proof:  Let  S*  be  the  set  of  vertices  whose  out-degree  is  less  than  k.  The  cardinal¬ 
ity  of  S*  is  no  more  than  2k -1,  since  the  tournament  induced  on  Sk  contains  a 
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vertex  whose  out-degree  is  at  least  [ — — — J.  Similarly,  there  are  at  most  2k  — \  ver- 

tices  whose  in-degree  is  less  than  k.  Therefore,  at  least  n.  —  4k +2  vertices  have 
both  in-  and  out-degrees  at  least  k.  Q 

Lemma  3.4  says,  for  example,  that  at  least  half  the  vertices  in  a  tournament 
have  both  in-  and  out-degrees  at  least  n/8.  This  implies  that  the  following  ran¬ 
domized  algorithm  will  be  very  efficient: 

procedure  R-PATH{T ) 

(1)  If  T  has  one  vertex  then  return  the  unique  vertex  of  T. 

(2)  Select  a  random  vertex,  o,  of  T. 

(3)  In  parallel  find  H\  — R—PATH{L(v ))  and  H  o  =  P—P ATH (W(v)). 

(4)  Return  the  path  (Hlt  v,  H2)- 
end  PATH. 

Lemma  3.5:  The  expected  depth  of  recursion  of  R^PATH  is  O(logrc). 

Furthermore, 

For  all  c  S 10  Pr[  recursion  depth  >  clogg^n  ]  <  n  c 

Proof:  Let  us  say  that  a  given  stage  of  the  algorithm,  whose  input  is  a  sub¬ 
tournament  on  some  number,  s,  of  vertices,  is  successful  if,  for  the  vertex  chosen  in 
step  (2),  both  W(v)  and  L{v)  have  no  more  than  7s/8  vertices.  Let  x  be  some  ver¬ 
tex.  We  can  describe  the  history  of  x  throughout  the  algorithm  by  a  zero-one  vec¬ 
tor,  Vx,  where  V;[i3  =  l  if  and  only  if  the  i’th  stage  in  which  x  participated  in  was 
successful.  By  definition,  Vz  contains  at  most  logg/7/1  l’s.  As  is  remarked  above, 
the  probability  of  success  of  any  stage  is  at  least  1/2.  Therefore, 
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Pr[V1[i]  =  l]  s  1/2  independently  for  all  i.  From  standard  results  on  tails  of  the 
binomial  distribution  we  get: 

For  all  cSlO  Pr[  length  of  Vx  >  clogg/7/1]  <  n-(c  +  1) 

Now,  since  there  are  n  vertices,  the  probability  that  some  vector,  V,  is  long  is  at 

most  n  times  the  probability  that  Vz  is  long,  which  yields  the  statement  of  the 
lemma.  The  statement  about  the  expected  depth  of  recursion  is  a  simple  conse¬ 
quence.  n 

It  turns  out  that  repeatedly  selecting  a  random  vertex  in  each  of  the  sub¬ 
tournaments  that  are  created  is  a  nontrivial  matter  (if  one  wants  to  do  it  in 

expected  constant  time).  The  difficulty  arises  from  the  fact  that  a  vertex  knows 
which  sub-tournament,  S,  it  belongs  to  at  any  given  stage,  but  it  does  not  know 
which  other  vertices  belong  to  S.  We  will  not  discuss  this  here,  only  state  that  it 
can  be  done.  Thus  our  randomized  algorithm  works  almost  surely  in  time  O(logn) 
and  uses  0(n)  processors  (one  for  each  vertex)  on  a  probabilistic  CRCW  PRAM, 
and  is,  therefore,  optimal. 

3.3.2  Hamiltonian  Cycle  and  Restricted  Path 

The  following  theorem,  due  to  Camion  [Camion]  (see  [BW]  page  173),  states 
exactly  when  a  tournament  has  a  Hamiltonian  cycle: 

Theorem  3.2:  A  tournament  is  Hamiltonian  if  and  only  if  it  is  strongly  connected. 

The  "only  if'  part  is  trivial.  The  other  direction  is  proven  by  induction,  but  a 
similar  proof  to  that  of  theorem  3.1  will  not  work  here,  since  removal  of  a  vertex 
from  a  strongly  connected  tournament  might  result  in  a  tournament  which  is  not 
strongly  connected.  A  classical  proof,  due  to  Moon  [Moon]  (see  [BW]  page  173), 
proves  a  stronger  claim:  a  strongly  connected  tournament  on  n  vertices  has  a  cycle 
of  length  k,  for  A=3,4,...,n.  We  omit  the  proof. 

Again,  the  proof  yields  an  efficient  algorithm,  which  seems  sequential  in 
nature.  For  our  parallel  solution  we  introduce  a  new  notion  -  a  restricted  Hamil¬ 
tonian  path. 

Definition:  A  restricted  Hamiltonian  path  is  a  Hamiltonian  path  with  a  specified 
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endpoint  (either  the  first  or  the  last  vertex,  not  both). 

A  natural  question  is  -  when  does  there  exist  a  Hamiltonian  path  starting 
(ending)  at  a  given  vertex,  u?  The  next  theorem  gives  the  precise  condition. 
Definition:  Let  T  be  a  tournament  and  u  be  a  vertex  in  T.v  is  a  source  [sinkl  of  T 
if  all  vertices  of  T  have  directed  paths  from  (to)  v. 

Theorem  3.3:  A  tournament,  T,  has  a  Hamiltonian  path  starting  (ending)  at  ver- 
tex  v  if  and  only  if  v  is  a  source  (sink)  of  T . 

Proof:  Again,  the  "only  if  part  is  trivial.  We  prove  the  second  direction  of  the 
theorem  for  a  source.  The  proof  is  symmetrical  for  a  sink.  The  proof  is  by  induc¬ 
tion  on  the  n,  the  order  of  T.  For  n  =1  the  claim  holds  trivially.  Assume  the  claim 
for  tournaments  of  n  vertices.  Let  T  be  a  tournament  of  order  n  +  1,  and  let  u  be  a 
source  of  T.  Using  the  inductive  claim  we  need  only  show  that  W(v)  contains  a 
source  of  G(V-{v}).  By  theorem  3.1,  W(v)  contains  a  Hamiltonian  path  starting  at, 
say,  u.  Thus  u  is  a  source  of  W(v).  Furthermore,  by  assumption  every  vertex  in 
L(v)  can  be  reached  from  some  vertex  in  W( v).  Thus  u  is  a  source  of  G(V-{v}).  □ 

Once  again  the  proof  implies  a  sequential  algorithm.  The  key  idea  for  an  NC 
algorithm  for  finding  a  Hamiltonian  cycle  in  a  strongly  connected  tournament  is  to 
tie  it  to  the  problem  of  finding  restricted  Hamiltonian  paths.  The  idea  is  that  each 
problem  will  be  solved  by  recursive  calls  to  the  other.  We  start  by  giving  an  alter¬ 
native  proof  for  theorem  3.3,  this  time  using  theorem  3.2. 

Second  Proof  of  theorem  3.3:  Let  Clf  C2,  ■  •  •  ,  C*  be  the  strongly  connected  com¬ 
ponents  of  T  such  that  Ci>C2>  '  •  •  >C*.  Since  u  is  a  source  of  T,  it  must  lie  in 
Cv  Since  Cx  is  strongly  connected,  it  contains  a  Hamiltonian  cycle,  Hv  Let  H2  be 
the  path  obtained  by  deleting  from  Hx  the  unique  arc  entering  o.  We  note  that  H2 
is  a  Hamiltonian  path  of  Cx  starting  at  v.  Let  Hz  be  a  Hamiltonian  path  of 
{C2i  C3>...,  CJ.  By  construction,  the  last  vertex  of  H2  dominates  the  first  vertex 
of  H 3,  so  (H 2,  H3)  is  a  Hamiltonian  path  of  T  starting  at  v.  □ 

Now  we  return  to  theorem  3.2  and  prove  it  using  theorem  3.3. 

New  Proof  of  theorem  3.2:  Let  T  be  a  strongly  connected  tournament  and  let 
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u€  V(T).  Let  Li>L2>  •  *  •  >Lq  be  the  strongly  connected  components  of  L(v)  and 
WX<W2<  ■  •  •  <Wp  be  the  strongly  connected  components  of  W{u).  Since  T  is 
strongly  connected  there  must  be  some  arc  leaving  WL.  Every  such  arc  must  go  to 
a  vertex  in  L(u)  (Since,  by  definition,  it  cannot  go  to  a  vertex  in  W„  t>l,  or  to  v), 
Let: 

m  =min/j|  a>b  for  some  a  6  W1,  b  €LJ, 

and  let  Wlt  h£Lm  be  such  that  wl>l1.  Symmetrically,  there  must  be  an  arc 
entering  Lx  and  let: 

k  =min/i|  a>b  for  some  a  6  Wt,  b  ZL-J, 
w Wh,  l2£L x  and  w2>l2.  The  construction  is  sho%vn  in  fig.  3.1. 

The  existence  of  a  Hamiltonian  cycle  of  T  is  shown  by  demonstrating  several 
paths  and  the  connections  between  their  endpoints.  These  paths  are  shown  in  fig. 
3.2.  The  paths  are  the  following: 

(1)  A  Hamiltonian  path  of  W:  ending  at  w 

(2)  A  Hamiltonian  path  of  {Lm,  Lm  +  1,  .  .  .  ,  Lq}  starting  at  ll. 

(3)  The  vertex  v. 

(4)  A  Hamiltonian  path  of  { Wk ,  Wk  +  1,  .  .  .  ,  Wp}  ending  at  w2- 

(5)  A  Hamiltonian  path  of  Lx  starting  at  /2. 

(6)  A  Hamiltonian  path  of  {W 2,  W3,  .  .  .  ,  L2,  L3,  ....  Lm„-J 

We  claim  that  the  concatenation  of  the  paths  above  in  the  order 
(1),(2),(3),(4),(5),(6)  forms  a  Hamiltonian  cycle  of  T.  First  notice  that  each  of  the 
paths  specified  does,  in  fact,  exist.  For  the  restricted  paths  ((1),  (2),  (4)  and  (5))  this 
is  a  consequence  of  theorem  3.3.  The  only  other  fact  we  need  to  verify  is  that  the 
arcs  between  endpoints  of  the  paths  are  in  the  desired  direction.  The  only  non- 
obvious  cases  are  the  connections  from  path  (5)  to  path  (6)  and  from  path  (6)  to 
path  (1).  For  showing  this  recall  that  we  chose  Lm  and  Wk  in  a  way  that  implies 
that  I2)  L3i  .  .  .  ,  Lm.l  all  dominate  Wx  and  W2,  W3,  .  .  .  ,  are  all  dominated 

by  Lv  Thus  the  last  vertex  of  path  (5)  must  dominate  the  first  vertex  of  path  (6). 
Similarly,  the  last  vertex  of  path  (6)  must  dominate  the  first  vertex  of  path  (1). 
Notice  that  both  endpoints  of  path  (6)  may  be  either  in  L(v)  or  in  W(u).  0 
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This  new  proof  gives  an  approach  for  an  NC  algorithm  -  by  selecting  «,  to  be  a 
"mediocre"  vertex  we  break  the  problem  into  several  subproblems  of  bounded  size: 

subgraphs  (1),  (2),  (3),  (4)  and  (5)  all  have  at  most  f  n  vertices.  However,  sub¬ 
graph  (6)  (the  union  of  components  Wo,  ,  W*-i  and  L2,  ■  •  •  .  may  be 

very  large.  In  fact  it  may  contain  all  but  five  vertices  of  T,  since  v,  wlt  w2,  n  and 
U  are  the  only  vertices  guaranteed  to  be  outside  of  this  subgraph. 

It  turns  out  that  this  apparent  obstacle  is  non-existent!  The  critical  observa¬ 
tion  is  that  the  Hamiltonian  path  we  need  to  find  in  (6)  is  not  restricted.  Therefore 
we  can  use  procedure  PATH  for  finding  this  path,  and  need  not  worry  about  the 
size  of  this  subproblem.  Thus  the  problem  of  finding  a  Hamiltonian  cycle  (or  res¬ 
tricted  Hamiltonian  path)  on  n  vertices  breaks  down  into  several  similar  problems, 

each  on  no  more  than  fn  vertices,  and  one  easier  problem  on  at  most  n  vertices. 

4 

The  algorithms  for  Hamiltonian  cycle  and  restricted  path  follow.  Note  that 
the  solution  to  the  Hamiltonian  cycle  problem  is  very  symmetrical,  as  demon- 

strated  in  fig.  3.2. 

procedure  RESTRICTED JPATH(T , endpoint, u) 

(1)  Let  n  =  order  of  T . 

(2)  If  n  =  1  then  return  the  unique  vertex  of  T. 

(3)  Find  strongly  connected  components  C1>C2>  '  '  ’  >Ck  T ■ 

(4)  If  endpoint  = ’start’  then 

(4.1)  In  parallel  find 

=  CYCLE  (CO 
tf2  =  PATH({C2 . CJ). 

(4.2)  Let  Hi  =  Hl  — {unique  arc  into  u}. 

(5)  If  endpoint  = ’end’  then 

(5.1)  In  parallel  find 

tfL  =  PATH({CU  .  .  .  ,Ch-J) 

H2  =  CYCLE  (C*) 
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(5.2)  Let  H 2=  H2  —  {unique  arc  out  of  u}. 
(6)  Return  the  path  (H  x,  H2). 
end  RESTRICTED  -PATH. 


procedure  CYCLE{T) 

(1)  Let  n  =  order  of  T. 

(2)  If  n  =  1  then  return  the  unique  vertex  of  T . 

(3)  Find  a  vertex,  v  6  T ,  whose  in-degree  and  out-degree  in  T  are  both  at  least 

[n/4j. 

(4)  Find  strongly  connected  components  Lx>  ...>Lq  of  L{v)  and  W  x<  ...<Wp 

of  W(u). 

(5)  In  parallel  find 

m=minjri|  a  >b  for  some  a  6  W1(  b  €LJ, 
k  =minfi|  a  >b  for  some  a  6  Wit  b^L-J, 

and  wx£Wx,  lx£Lm,  u'2C  WA,  l2£L x  such  that  wx>lx  and  w2>l2. 

(6)  In  parallel  find 

Hl  =  RESTRICTED-J3ATH{Wl,  'end',  wx) 

H2  =  RESTRICTEDJPATH{{Lm,  .  .  Lq},  'start',  lx) 

H2  =  RESTRICTED -PA  TH{{Wk,  .  ,  Wp},  'end',  w2) 

H  ^RESTRICTED  JPATH{LX,  'start',  l2) 

H5  =  PATH{{W2,  ...,Wk_x,L2,...,  Lm.J) 

(7)  Return  the  cycle  {v,  H2,  H4,  H 5,  Hx,  H2,  v) 
end  CYCLE. 

We  now  indicate  how  these  algorithms  can  be  performed  using  0(n2/logn) 
processors  in  0( log2)  time  on  a  CREW  PRAM.  Finding  strongly  connected  com¬ 
ponents  can  be  done  using  these  resources,  as  described  in  section  3.2.  Finding  the 
minimum-index  component  that  has  an  arc  in  a  given  direction  to  another  com¬ 
ponent  can  be  done  by  a  standard  prefix  computation  on  a  subset  of  the  arcs  of  T 
(e.g.  [Fich]).  Finally,  in  each  stage  we  need  to  compute  PATH.  This  seems  to  be  a 
problem  since  PATH  itself  takes  OGog2^)  time  and  the  recursion  depth  is  O(logn). 
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The  observation  here  is  that  the  result  returned  from  PATH  is  not  required  in 
order  to  generate  the  recursive  calls  to  CYCLE  and  RESTRICTED-PATH .  There¬ 
fore  all  the  calls  to  PATH  can  be  performed  separately  from  the  main  recursion 
(for  example  after  completing  it),  and  then  all  the  paths  (restricted  and  non- 
restricted)  can  be  connected  together  in  the  appropriate  manner.  Since  no  vertex 
or  arc  appears  in  more  than  one  call  to  PATH,  this  additional  step  (of  all  calls  to 
PATH)  can  be  done  with  the  stated  resources. 

3.3.3  Open  Problems 

We  have  shown  that  finding  a  Hamiltonian  path  and  a  restricted  Hamiltonian 
path  in  a  tournament  are  both  in  NC .  A  natural  question  is:  what  is  the  complex¬ 
ity  of  finding  a  doubly  restricted  Hamiltonian  path  in  a  tournament,  T ,  i.e  a  Hamil¬ 
tonian  path  from  a  specified  vertex,  a,  to  another  specified  vertex,  b.  We  know 
how  to  solve  this  problem  in  NC  if  either  of  the  graphs  T,  T-{a},  T-{b}  or 
T-{a,b}  is  not  strongly  connected.  However,  if  all  these  graphs  are  strongly  con¬ 
nected,  we  do  not  even  know  if  the  problem  is  solvable  in  polynomial  time. 

Another  interesting  problem  is  whether  there  is  an  efficient  deterministic  NC 
algorithm  for  finding  a  Hamiltonian  path  in  a  tournament.  As  stated  above,  the 
sequential  complexity  of  this  problem  is  0(nlogn),  whereas  the  complexity  of 
finding  a  "mediocre'’  vertex  is  0(n2).  Therefore  a  totally  different  approach  is 
required  to  solve  the  path  problem  efficiently  in  parallel.  It  might  be  possible  to 
show  that  no  such  algorithm  exists  by  proving  that  any  algorithm  that  asks  about 
n  arcs  in  one  step  needs  many  steps  (i.e.  more  than  poly-log)  in  the  worst  case 
before  it  discovers  a  Hamiltonian  path  in  a  tournament. 

3.4  The  Tournament  Construction  Problem 
3.4.1  The  Upset  Sequence 

Our  approach  for  constructing  a  specified  tournament  is  based  on,  what  we 
call,  the  upset  sequence  of  a  tournament,  T,  which  describes  the  difference 
between  T  and  a  transitive  tournament.  If  we  list  the  vertices  according  to  their 
scores  in  non-decreasing  order,  then  an  is  when  a  vertex,  u,  dominates  some 
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other  vertex  appearing  later  than  u  in  the  list.  We  call  an  arc  corresponding  to  an 
upset  a  reverse  arc.  Transitive  tournaments  are  exactly  those  tournaments  that 

contain  no  upsets. 

Definition;  Let  Sl<  •  ■  •  <s„  be  the  score  list  of  a  tournament,  T,  and  let  vt  be  the 
vertex  of  score  s,  (for  all  lsisn).  The  upset  sequence .  of  T,  is  the  sequence,  u, 
where  uk  is  the  number  of  upsets  between  {vu  .  .  .  ,vj  and  {vk  +  i,  .  •  •  ,vj  (for  all 
lS*Sn-l). 

The  score  list  uniquely  determines  the  upset  sequence  (and  vice-versa): 


Lemma  3.6:  Let  T  be  a  tournament  with  score  list  T  and  upset  sequence  u.  Then 
for  all  0 sBn  -  1: 


“k 


k 

2 


Proof:  There  are  exactly 


2  arcs  in  the  subgraph  induced  on  {vx, 


,  .  ,V)J ,  since  it 


is  also  a  tournament.  Therefore  the  right  hand  side  describes  the  number  of  arcs 
whose  tail  is  in  {vv  .  .  .  but  whose  head  isn’t.  U 


Corollary  3.1:  For  all 

sh  =  uk-uk,  i  +  A-1 

How  can  we  use  the  upset  sequence?  Our  approach  is  to  construct  a  tourna¬ 
ment  with  a  given  score  list  by  starting  with  a  transitive  tournament  and  revers¬ 
ing  some  of  its  arcs.  The  upset  sequence  of  the  desired  tournament  gives  us  a  han¬ 
dle  on  which  arcs  to  reverse.  We  will  be  aided  by  a  graphical  representation  of  the 
upset  sequence,  which  we  now  discuss. 

A  sequence  of  non-negative  integers  can  be  represented  graphically  by  its  his¬ 
togram.  We  will  treat  the  histogram  as  a  rectilinear  polygon  (  and  call  it,  simply,  a 
polygon  ),  which  is  divided  into  squares,  each  of  which  has  integral  x  and  y  coordi¬ 
nates.  The  x  coordinate  is  a  square’s  column  and  the  y  coordinate  is  its  height^  An 
example  of  a  polygon  is  shown  in  fig.  3.3.  Any  collection  of  squares  of  a  polygon  is 
a  sub  —  polygon .  A  maximal  set  of  consecutive  squares  at  the  same  height  is  called 
a  slice.  Note  that  a  polygon  can  have  several  slices  at  the  same  height  (if  it  is  not 
convex).  A  (horizontal)  segment  is  consecutive  set  of  squares,  all  in  the  same  slice. 
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We  denote  a  segment  or  slice  by  [Z,r]  or  by  [Z,r;H  where  1  and  r  are,  respectively, 
the  columns  of  the  leftmost  and  rightmost  squares  it  contains,  and  h  is  its  height. 

A  polygon  representing  the  upset  sequence  of  a  tournament  will  be  called  an 

upset  polygon . 


An  elementary  property  of  a  polygon,  which  follows  from  its  definition  is: 

Proposition  3.1:  The  slices  of  a  polygon  form  a  nested  structure:  if  Nl,'-l)  and 
[/2,r2l  are  slices  with  /,Bij  then  either  /i>r2  or  rvSr2. 

We  define  the  following  partitioning  problem:  Given  a  rectilinear  polygon  as 
shown  in  fig.  3.3,  partition  each  of  its  slices  into  segments  such  that  no  two  seg- 
ments  in  the  partition  agree  on  both  endpoints.  Such  a  partition  is  said  to  be 
valid,  and  is  defined  by  the  set  of  segments  it  contains.  An  example  of  a  valid  par¬ 
tition  is  illustrated  in  fig.  3.4.  The  partition  is  {[1,141,  [2,4],  [2,5],  [2,141,  [4,4], 
[4,5],  [5,5],  [5,14],  [8,81,  [8,9],  [8,10],  [8,11],  [9,11],  [10,11],  [11,12]}. 
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Fig.  3.4:  A  valid  partition  of  the  polygon  of  fig.  3.3. 

Lemma  3.7:  A  valid  partition  of  the  upset  polygon  yields  a  solution  to  the  con¬ 
struction  problem. 

Proof:  Let  /[/,,/■;]  |  l<i^m}  be  the  set  of  segments  in  a  valid  partition  of  an  upset 
polygon  representing  a  sequence  u  corresponding  to  the  score  list  s  =  Sj,  .  .  .  ,s„. 
Let  T  be  the  tournament  obtained  by  taking  the  n  vertex  transitive  tournament 
and  reversing  the  arcs  |  1  By  inspection,  the  number  of  reverse 

arcs  crossing  the  cut  {{v\,  .  .  .  +  •  •  •  > is  exactly  w*.  Therefore  (by  corol¬ 

lary  3.1),  T  is  a  tournament  with  score  list  s'.  □ 

Note  that  each  slice  in  fig.  3.4  is  partitioned  into  at  most  two  segments.  This 
is  not  a  coincidence. 

Definition:  A  2 -partition  is  a  valid  partition  in  which  every  slice  is  parti¬ 
tioned  into  at  most  2  segments.  A  slice  which  is  partitioned  into  at  most  2  seg¬ 
ments  is  2  —  partitioned. 

We  will  deal  only  with  2-partitions  because  of  the  following: 

Lemma  3.8:  If  a  polygon  has  a  valid  partition,  then  it  has  a  2-partition. 

Proof:  Let  P  be  a  valid  partition  of  some  polygon,  which  is  not  a  2-partition.  Let  S 
be  a  slice  which  is  partitioned  into  more  than  2  segments  such  that  all  slices  lying 
above  S  are  2-partitioned.  We  will  prove  the  lemma  by  showing  how  to  transform 
P  into  another  valid  partition  in  which  S  is  2-partitioned  and  the  partition  of 
slices  above  S  is  unchanged. 
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Let  the  segments  comprising  S  in  P  be,  from  left  to  right,  1  j7" 1 1 ,  •  •  •  >[lk>rki 
(where  k  >2).  If  either  or  [l2trk]  does  not  appear  in  P,  then  the  partition 

of  S  can  be  replaced  with  or  {[li,r respectively.  If  both 

appear,  then  at  least  one,  say  [Zltr*_i],  must  appear  in  a  slice  below  S  (call  this 
slice  T).  This  follows  from  the  assumption  that  all  slices  lying  above  S  are  2- 
partitioned  and  from  the  nesting  property  (proposition  3.1).  Now,  simply  assign 
the  segment  [Z i ,/“*  —  il  to  S  and  the  segments  [Zi,r iL  .  .  .  ,Uk’rk^  to  T.  D 

Not  every  rectilinear  polygon  of  the  type  discussed  has  a  valid  partition.  Two 
examples  are  shown  in  fig  3.5. 


12  1234567 

(1)  (2) 

Fig.  3.5:  Examples  of  polygons  which  have  no  valid  partition. 

We  will  show,  however,  that  every  upset  polygon  has  a  2-partition.  A  few  more 
definitions  are  required  for  this:  a  left  {right)  face  is  a  maximal  vertical  line  seg¬ 
ment  on  the  leR  (right)  part  of  the  boundary  of  a  polygon.  Face  A,  if  it  exists,  is 
the  face  between  columns  k  —  1  and  k.  Two  faces,  L  and  R,  are  opposing  if  there  is 
some  slice  starting  at  L  and  ending  at  R .  The  width ,  w{F)  of  a  face,  F,  is  the 
minimum  distance  between  it  and  any  of  its  opposing  faces  (where  distance  is 
measured  by  number  of  squares).  The  length  of  a  face  F  (i.e  the  number  of  slices  it 
touches)  is  denoted  by  1{F). 

Lemma  3.9:  A  polygon,  D,  has  a  2-partition  if  the  length  of  every  face  of  D  is  no 
more  than  half  its  width. 

Proof:  We  prove  the  lemma  by  induction  on  the  height  of  D.  If  the  height  is  1 
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then  D  clearly  has  a  2-partition.  Assume  the  claim  holds  for  all  polygons  of  height 
k  -1,  and  let  k  be  the  height  of  D.  Let  D'  be  the  polygon  obtained  by  removing  the 
bottom  slice  from  D.  By  the  inductive  assumption,  D'  has  a  2-partition,  P.  We  will 
show  that  P  can  be  extended  to  a  2-partition  of  D.  Let  L  and  R  be,  respectively, 
the  left  and  right  faces  bounding  the  bottom  level  of  D.  P  contains  l(L)  —  l  seg¬ 
ments  starting  at  L  and  l(R)~  1  segments  ending  at  R.  By  the  condition  of  the 

lemma, 

width  of  bottom  slice  ^  w{L)  ,  w(R)  s  l(L)  +  UR) 

Therefore,  by  the  pigeonhole  principle,  there  are  two  segments  that  partition  the 
bottom  slice,  which  are  not  contained  in  P.  Thus  P  can  be  extended  to  become  a 
2-partition  of  D.  Q 

Lemma  3.10:  In  an  upset  polygon  the  length  of  every  face  is  no  more  than  half  its 
width. 

Proof:  Let  A(A)  be  the  difference  in  height  between  the  highest  square  with  x- 
coordinate  k  and  the  highest  square  with  x-coordinate  k-1.  In  other  words,  if  F  is 
a  left  face  bounding  squares  with  x-coordinate  k,  then  A (k)  =  UF).  If  P  is  a  right 
face  then  A(£)  =  —  UF).  L sing  corollary  3.1: 

A(&)  =  it*- Wjk-i  =  s*  —  &  +  1 
Since  5*  is  non-decreasing,  it  follows  that: 

{*)  for  all  2<k<n-\  A(A)  2  A(A -1)-1 
Say  face  k  is  a  left  face,  L.  The  nearest  opposing  face  of  L  occurs  to  the  right  of 
the  first  value,  r,  such  that  r>k  and  A*  +  1  +  A*  +  2  +  '  '  '  +Ar<0-  The  smallest 
possible  r  value  can  occur  (by  (*))  when  At  =  A*  +  1  +  l=  •  •  •  =  Ar  +  r-fc.  In  this 
case: 

w(L)  =  r  —  A  +  l  =  2A(&)  =  22  (L) 

A  symmetric  argument  works  for  right  faces.  [] 

Theorem  3.4:  Every  upset  polygon  has  a  2-partition. 

3.4.2  2-Partitioning  the  Upset  Polygon 

As  described  in  the  previous  section,  our  algorithm  works  as  follows,  given  the 
score  list,  T,  we  compute  its  corresponding  upset  sequence  u  and  construct  a  2- 
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partition,  P,  of  the  upset  polygon.  In  the  output  tournament,  for  all  1  *></*«. 
dominates  vj  if  and  only  if 

What  remains  to  be  shown  is  how  to  compute  a  2-partition  of  an  upset 
polygon,  U,  efficiently  in  parallel.  Basically,  our  approach  is  to  construct  the  parti¬ 
tion  according  to  faces.  We  first  observe  that  it  is  a  simple  task  to  partition  a  set 
of  slices  with  a  common  face  as  follows;  say  the  common  face  is  a  left  face  Let  t  e 

set  of  slices  be,  from  top  to  bottom,  Si . Sm,  where  S,=[i,r, 1  for  all  1-i-m. 

Then  S,  will  be  partitioned  into  the  segments  UJ  +  i  1  and  [1  +  i  +  l-r,].  This  i 
shown  in  fig.  3.5.  Such  a  partition  is  always  possible  given  lemma  3.10.  A  sym 
metric  partition  exists  for  slices  sharing  a  right  face. 


is 


Fig.  3.5:  2-partitioning  a  set  of  slices  with  a  common  left  face. 

,f  we  simultaneously  partition  the  entire  polygon  in  the  manner  described  (accord- 
ing  to  left  faces),  the  resulting  partition  might  not  be  valid,  since  a  right  face  can 
be  opposite  several  left  faces.  Our  solution  is  to  have  every  slice  "belong’  to  one  o 
(the  two)  faces  it  touches,  and  to  be  partitioned  accordingly.  More  specifically  it 
belongs  to  the  dominant  face  according  a  domination  relationship  defined  as  fol¬ 
lows;  a  left  face,  L,  donates,  an  opposing  right  face,  S,  unless  the  top  slice  touch¬ 
ing  L  touches  R  but  the  top  slice  touching  R  does  not  touch  L  (in  other  words,  R 
is  the  highest  face  opposing  L  but  not  vice-versa). 

Theorem  3.5:  Let  S=[l.r,M  be  a  slice  belonging  to  face  F.  Let  SF=lV,r\k'l  be 
the  highest  slice  belong  to  F.  Say  we  partition  S  into  2  segments  such  that  the 
length  of  the  segment  touching  F  is  h'  -h  +  1.  If  we  perform  this  partitioning  or 
all  the  slices  of  an  upset  polygon,  U,  then  the  result  is  a  (valid,  2-partition  of  17 
Proof-  First  we  note  that  if  two  slices  belong  to  the  same  face,  they  must  be  of 
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different  height,  so  their  partition  cannot  conflict  (i.e.  create  segments  with  identi¬ 
cal  endpoints).  Therefore,  the  only  conceivable  way  in  which  a  conflict  can  occur  is 
from  partitioning  two  slices,  SL  and  S2,  that  belong  to  faces  Ll  and  R2  respec¬ 
tively,  where  Lx  and  R2  are  left  and  right  faces.  Furthermore,  Ll  and  R2  must  be 
opposing  faces  because  of  the  nesting  property  (proposition  3.1). 

We  note  that  the  set  of  slices  belonging  to  some  face  is  consecutive.  Say  Lx 
dominates  R2  (the  other  case  is  symmetrical).  Then  the  right  endpoint  of  a  seg¬ 
ment  created  from  a  slice  belonging  to  is  at  distance  at  most  liL^)  from  and 
the  left  endpoint  of  a  segment  created  from  a  slice  belonging  to  R  2  is  at  distance  at 
most  UR--,)- 1  from  R  2.  Now  we  apply  lemma  3.10:  the  distance  between  L1  and 
R  2  is  at  least  l{Ll)Jrl(R2).  Therefore  all  right  endpoints  of  segments  created  from 
slices  belonging  to  Lj  are  less  than  all  left  endpoints  of  segments  created  from 
slices  belonging  to  R2,  so  no  conflict  can  occur.  0 


3.4.3  Implementation  Details 

We  now  describe  in  detail  a  parallel  implementation  of  the  tournament  con¬ 
struction  algorithm  described  above.  Our  algorithm  works  in  time  O(logn)  and 
uses  0(n2/logn)  processors  on  a  concurrent  read  -  exclusive  write  (CREW)  PRAM, 
where  n  is  the  number  of  vertices  in  the  tournament.  Our  parallel  algorithm  is 
optimal,  since  the  size  of  the  output  is  9(n2).  Some  of  the  procedures  will  be  easier 
to  describe  as  using  0(n2)  processors  and  working  in  constant  time.  Each  such 
procedure  can  clearly  be  slowed  down  to  work  in  time  O(logn)  using  only 
0(n2/logn)  processors. 

Let  U  be  the  upset  polygon  corresponding  to  the  input  score  list.  The  area  of 
U  (i.e.  the  number  of  squares  it  contains)  can  be  0(n3),  since  its  height  can  be 
Q{n2)  (for  example,  the  area  of  an  upset  polygon  of  a  a  regular  tournament  is 

{n  ^e  first  step  we  perform  is  to  "compress"  U  to  get  an  0(n2) 

12 

representation. 

Let  li<l2<  •  •  '  <lm  be  the  sorted  list  of  values  of  the  upset  sequence  u  (/,  is 
the  i’th  smallest  u  value).  The  i’th  level  of  U  is  the  sub-polygon  with  y- 
coordinates  between  +  l  and  / ;  (where  =  ^ 's  easy  see  that  each  level 
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is  a  collection  of  rectangles.  In  other  words,  for  every  column  j  and  level  r,  squares 
in  j  appear  either  in  all  the  heights  of  r  or  in  none  of  them.  We  can,  thus,  talk 
about  "slices  at  level  r".  We  represent  U  by  a  zero-one  matrix,  LEVEL,  where 
LEVEL[rJ]  =  1  if  and  only  if  u^lr.  For  a  complete  description  we  also  keep  a  vec¬ 
tor  HEIGHT,  where  HEIGHT[r]  is  the  height  of  the  highest  slice  in  level  r. 
LEVEL  can  be  computed  using  0(n2)  processors,  each  computing  one  entry  in  con¬ 
stant  time. 

We  now  list  the  steps  of  the  computation.  In  each  step  a  matrix  or  vector  is 
computed,  and  in  the  final  step  a  processor  is  assigned  to  each  slice  and  2- 
partitions  it.  We  start  by  listing  the  matrices  and  vectors  computed  and  then 
describe  in  detail  how  each  step  is  implemented. 

A  vector  TOP-LEVEL.  TOP-LEVEL[k]  is  the  maximum  level,  r,  such  that 
LEVEL[r,k]  =  1  (i.e.  the  highest  level  of  column  k). 

A  matrix  ENDPOINT.  If  there  is  a  slice  [ij]  in  level  r,  then  ENDPOINT[rJ]  =  i 
and  ENDPOINT[r ,i]—j .  If  no  slice  begins  or  ends  at  column  j  in  level  r  then 
ENDPOINT[rJ]  =  empty. 

Matrices  TOP  and  BOTTOM.  TOP[iJ)  is  the  top  level  in  which  slice  [ij] 
appears.  BOTTOM[iJ]  is  the  bottom  level  in  which  slice  [ ij ]  appears. 
(Again,  an  entry  is  empty  if  no  such  slice  exists). 

Face  domination  matrix,  FD.  FD[iJ]  =  1  if  face  j  dominates  face  i.  FD[iJ]  —  0  if 
face  i  dominates  face  j .  FD[i  J]  =  empty  if  faces  i  and  j  are  not  opposing.  (See 
section  3.4.2  for  the  definition  of  face  domination.) 

Vector  TOP  SLICE.  TOPSLICE[k]  is  the  level  of  the  highest  slice  that  belongs 
to  face  k  (the  face  between  columns  k-l  and  k ). 

TOP  _ LEVEL  can  be  computed  in  constant  time  by  assigning  a  processor  to  each 
entry  of  LEVEL  to  check  if  it  is  1  and  the  entry  above  it  is  0. 

The  r’th  row  of  ENDPOINT  is  computed  using  0(n/\ogn)  processors  in 
O(logn)  time  by  a  balanced  binary  tree  computation  ([MR]).  We  "plant”  a  bal¬ 
anced  complete  binary  tree  with  n  —  1  leaves  on  level  r  of  the  upset  polygon.  Each 
node,  N,  in  the  tree  represents  a  range  of  entries  in  row  r  of  LEVEL,  between 
columns  l(N)  and  r{N).  A  node  computes  three  functions: 
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propagateiN )  -  is  true  iff  all  the  entries  represented  by  N  are  1. 

start-rightiN)  -  the  first  column  of  a  slice  starting  between  UN)  and  r{N )  and  end¬ 
ing  to  the  right  of  r(N)  —  1. 

end-left (N)  -  the  last  column  of  a  slice  ending  between  l(N)  and  r(N)  and  starting 
to  the  left  of  1{N)  + 1. 

An  internal  node,  N,  has  two  children,  N ieft  and  N  right,  where  l(N = 
r(Nnghl)  =  r(N)  and  r{N left)  =  liNright)  -  1.  Then  we  have: 
propagateiN )  =  propagateiN and  propagateiN  right). 

start-rightiN)  =  if  propagateiN rtgkl)  then  start-rig htiN left)  else  start_rightiN right). 
end-leftiN )  =  if  propagateiN  ieft)  then  end-leftiN rigkt)  else  end-left(N  Uft). 

The  leaves  of  the  tree  represent  single  entries.  If  an  entry  is  0  then 
propagate  =  false  and  end-left  and  start-right  are  both  empty.  If  an  entry  is  1 
then  propagate  =  true  and  end-left  and  start-right  are  both  j  (for  the  leaf 
representing  entry  j).  A  node  computes  its  functions  after  its  children  have  com¬ 
puted  theirs.  Furthermore,  N,  writes  end-leftiN  right)  in 
ENDPOINT[start-right(N  1^)  and  start-rig  ht(N  ieft)  in 

ENDPOINT[end-leftiNright).  Note  that  a  value  may  be  overwritten  several  times. 
After  completing  computing  the  functions  for  the  whole  tree,  for  each  entry,  j,  if 
LEVEL[rJ  - 1]  =  1  and  LEVEL[rJ  +  1]  =  1,  then  ENDPOINT[rJ ]  is  set  to  empty. 

It  takes  O(logrc)  time  for  the  node  functions  to  be  evaluated  for  the  entire 
tree.  The  whole  computation  can  be  done  with  Oin/\ogn)  processors  by  a  standard 
load-balancing  trick,  as  described  in  [MR].  Proof  that  this  procedure  works 
correctly  is  straightforward,  and  is  omitted. 

TOP  and  BOTTOM  are  computed  by  having  a  processor  for  each  entry  of 
ENDPOINT .  Processor  [r,i]  writes  ”j”  in  TOP[iJ]  if  ENDPOINT[r,i]-j  and 
ENDPOINT[r  +  l,i]*j.  Similarly  for  BOTTOM. 

FD[iJ]  =  1  if  ENDPOINT[TOP[iJ]  +  lJ]  =  empty  and  either 
ENDPOINT[TOP[iJ]  +  lJ]*empty  or  i<j  ). 

For  computing  TOP-SLICE,  let  t=ENDPOINT[TOPJLEVEL[k],k],  Then 
[fc,f]  is  the  highest  slice  touching  face  k.  If  FD[k,t]  =  1  then  TOP-SLICE[k]  is 
equal  to  TOPJLEVEL[k].  Otherwise,  it  is  one  level  below  BOTTOM[k,t]  (unless 
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face  k  has  no  other  slices  than  [&,i],  which  can  be  checked  by  looking  up 
LEVEL[BOTTOM[k,t]-l,k])- 

Finally  we  partition  each  of  the  slices.  Let  s=[l,r\h]  be  a  slice.  We  use  FD  to 
find  if  s  belongs  to  face  l  or  face  r.  Then  we  use  TOP-SLICE  and  HEIGHT  to  find 
the  height,  h' ,  of  the  highest  slice  belonging  to  that  face.  Now  we  can  partition  s 
according  to  its  height,  h,  and  h'  as  described  in  theorem  3.5. 

We  need  to  show  how  to  assign  processors  to  slices.  One  way  to  do  it  is  as  fol¬ 
lows:  a  vector,  V ,  is  created  with  one  entry  for  each  left  face,  with  the  entry  being 
the  length  of  the  face.  A  vector,  P,  of  partial  sums  of  V  is  computed.  This  vector 
contains,  essentially,  an  enumeration  of  the  slices.  Let  a  be  the  total  number  of 
slices  of  U.  We  assign  logn  consecutive  slices  to  each  of  a/logn  processors.  Each 
processor  finds  its  first  slice  in  time  O(logn)  by  a  binary  search  on  P.  After  that, 
each  of  the  successive  slices  is  accessed  in  constant  time. 
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CHAPTER  FOUR 

STRONG  ORIENTATION  OF  MIXED  GRAPHS 
AND  RELATED  AUGMENTATION  PROBLEMS 


4.1  Introduction 

The  strong  orientation  problem  is  a  problem  in  graph  theory  that  stems  from 
the  following  question  -  can  the  streets  of  a  city  be  all  changed  into  one-way- 
streets  so  that  every  point  is  reachable  from  any  other?  There  are  two  variants  of 
this  problem:  the  first  variant  is  when  the  input  graph,  G,  is  undirected  (i.e.  the 
city  initially  contains  only  two-way  streets).  A  more  general  situation  is  when  G 
is  mixed,  i.e.  contains  some  directed  arcs  and  some  undirected  edges.  In  both 
cases,  the  problem  is  to  assign  orientations  to  the  undirected  edges  of  G  to  yield  a 
strongly  connected  digraph.  If  G  is  mixed  we  require  that  the  directions  of  the 
arcs  of  G  are  not  altered  by  the  orientation.  If  G  admits  such  an  orientation  we 

say  it  is  strongly  orientable . 

What  if  G  is  not  strongly  orientable?  We  define  the  minimum  strong  augmen¬ 
tation  problem:  find  a  minimum  set  of  arcs  (or  edges)  whose  addition  to  G  will 
make  it  strongly  orientable.  Notice  that  we  need  only  consider  addition  of  arcs, 
since  in  a  strong  orientation  each  edge  is  replaced  by  an  arc.  It  is  interesting  to 
consider  the  special  cases  when  G  is  either  completely  directed  or  completely 
undirected.  In  the  first  case  the  problem  becomes  -  add  a  minimum  set  of  arcs  to  a 
digraph  to  make  it  strongly  connected.  When  G  is  undirected  the  problem  is  (by 
Robbins’  theorem,  as  we  shall  see)  -  add  a  minimum  set  of  edges  to  make  G 
bridge-connected,  where  a  graph  is  bridge  -  connected  if  it  is  connected  and  has  no 
bridges.  In  this  context  a  bridge  is  an  edge  whose  removal  disconnects  the  graph. 

The  algorithms  presented  in  this  chapter  all  run  in  time  O(logn)  on  a  CRCW 
PRAM,  where  n  is  the  number  of  vertices  in  the  input  graph.  There  are  two  main 
classes  -  algorithms  for  undirected  graphs  and  algorithms  for  directed  graphs.  It 
turns  out  that  all  our  algorithms  for  undirected  graphs  use  0{n  +  m)  processors, 
where  m  is  the  number  of  edges  in  the  input  graph.  Our  algorithms  for  digraphs 
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involve  computing  transitive  closure,  and  therefore  require  O(MU))  processors, 
where  M(n)  is  the  sequential  time  for  multiplying  two  nXn  matrices.  A  realistic 
bound  is  0(n3/logn).  Asymptotically  less  processors  are  required  using  fast  matrix 
multiplication  methods  (which  have,  so  far,  all  been  parallelizable).  The  current 
best  algorithm  known  works  in  time  0(»2-3'6)  <[CW]).  This  difference  between  the 
efficiencies  of  algorithms  of  the  two  classes  are  a  common  phenomena  in  the  litera¬ 
ture,  and  the  algorithms  in  this  chapter  are  another  example. 

A  basic  operation  that  we  use  many  times  is  finding  strongly  connected  com¬ 
ponents,  and  it  is  not  hard  to  show  that  the  problems  we  solve  are  at  least  as  hard 
as  finding  strongly  connected  components.  A  big  open  problem  in  the  field  is 
finding  an  algorithm  to  compute  strongly  connected  components  (or  even,  say,  test 
if  a  digraph  is  acyclic)  more  efficiently  then  by  computing  transitive  closure.  Note 
that  the  linear-time  sequential  algorithm  relies  on  depth-first  search,  which  is  not 
known  to  be  in  NC,  or  even  RNC ,  for  directed  graphs.  Therefore  the  algorithms 
we  present  are  "optimal  with  respect  to  the  state  of  the  art  . 

The  results  presented  in  this  chapter  are: 

(1)  An  AC  algorithm  for  strongly  orienting  a  mixed  graph.  Parallel  algorithms 
that  appeared  previously  in  the  literature  have  been  for  strongly  orienting 

undirected  graphs. 

(2)  NC  algorithms  for  constructing  a  minimum-cardinality  set  of  edges  to  make  a 
graph  bridge-connected,  and  a  minimum-cardinality  set  of  arcs  to  make  a  digraph 
strongly  connected.  The  solution  of  the  latter  problem  involves  introducing  the 
notion  of  a  dense  matching ,  which  can  be  computed  efficiently  m  parallel. 

(3)  An  NC  algorithm  for  constructing  a  minimum-cardinality  set  of  arcs  to  aug¬ 
ment  a  mixed  graph  into  a  strongly  orientable  mixed  graph.  Before  our  work  no 
report  of  even  a  polynomial-time  solution  to  this  problem  appeared  in  the  litera¬ 
ture.  Independently  of  our  research  an  optimal  sequential  algorithm  has  been  pub¬ 
lished  by  Gusfield  ([Gusf]). 

A  few  words  about  related  hard  problems.  One  natural  question  is  to  find  a 
strong  orientation  for  which  the  diameter  (i.e.  maximum  distance  between  two  ver- 
tices)  of  the  resulting  digraph  is  minimized.  Chvatal  and  Thomassen  ([CTD  have 
shown  that  even  the  problem  of  deciding  if  there  exists  an  orientation  for  which 
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the  directed  diameter  is  equal  to  2  is  NP-hard.  On  the  other  hand  they  show  that 
any  graph  with  diameter  d  admits  an  orientation  with  diameter  at  most  2d“  +  2d, 
and  for  diameter,  d,  there  are  graphs  of  arbitrary  connectivity  for  which  any  orien¬ 
tation  has  diameter  at  least  d2/A  +  d. 

Another  interesting  problem  is  to  find  a  minimum-weight  strong  augmenta¬ 
tion.  Again  this  is  NP-hard  even  in  a  very  limited  case  -  when  the  weights  (of 
potential  arcs  to  be  added)  are  all  either  1  or  2.  This  follows  trivially  from  reduc¬ 
tions  in  [ET]  for  both  the  directed  and  undirected  case.  By  a  similar  reduction  one 
can  show  that  finding  a  minimum  cardinality  strong  augmentation  from  a  given 
subset  of  the  arcs  is  NP-hard. 

Definitions  and  notation:  we  define  a  mixed  graph_  G  =  (V  JEJi.)  as  follows:  V  is 
the  vertex  set ;  E  is  the  edge  set,  where  an  edge  is  an  unordered  pair  of  vertices;  A 
is  the  arc  set,  where  an  arc  is  an  ordered  pair  of  vertices.  We  denote  the  vertex, 
edge  and  arc  sets  of  G  by  V(G),  £(G)  and  A(G)  respectively.  Given  an  edge, 
e—{u,v },  e£E{G),  orienting  e  means  deleting  e  from  E(G)  and  adding  a  new  arc, 
a,  to  A(G),  where  either  a  =(u,  v)  or  a  =  (v,u).  Undirecting  an  arc ^  a  =(u,u),  means 
replacing  it  by  the  edge  e={u,v}.  The  underlying  undirected  gra^h  of  a  mixed 
graph,  G,  is  the  graph  obtained  by  undirecting  all  its  arcs.  Directing  an  edge ± 
e  =  {u,v},  means  replacing  it  by  the  pair  of  arcs  (u,v)  and  (o,u).  The 
underlying  directed  graph  of  a  mixed  graph,  G,  is  the  graph  obtained  by  directing 

all  its  edges. 

A  path  from  u  to  v  in  a  mixed  gTaph,  G,  is  a  sequence  of  distinct  vertices, 
u  =  vl,u2,...,vk  =  v,  such  that,  for  all  l^i<k,  either  {vitv^JiE(G)  or 
(u,,ui  +  1)  £  A(G).  We  say  that  u  is  reachable  from  u  if  there  is  a  path  from  u  to  v. 
A  mixed  graph  is  connected  if  every  vertex  is  reachable  from  every  other  vertex 
(or,  equivalently,  its  underlying  directed  gTaph  is  strongly  connected). 

We  will  be  constructing  several  graphs  based  on  a  given  mixed  graph, 
G=(V JE ,A),  which  we  define  here: 

U{G)  =  {V  JZ)  —  The  undirected  part  of  G. 

DiG)  =  (V ,A)  -  The  directed  part  of  G. 

5(G)  -  The  strong  component  gTaph  of  G.  This  is  a  mixed  multigraph  (i.e.  one 
in  which  there  can  be  several  arcs  and  edges  joining  two  vertices)  whose 
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vertices  are  the  strongly  connected  components  of  D{G).  The  edges  and 
arcs  are  those  going  between  strongly  connected  components  of  D[G). 


4.2  Background 

4.2.1  Theorems  and  Sequential  Complexity  of  Strong  Orientation 

The  solution  to  the  decision  problem  of  strong  orientation  for  undirected 
graphs  was  given  by  Robbins  in  1939: 

Theorem  4.1  (Robbins’  theorem  [Robbin]):  An  undirected  graph  is  strongly  orient- 
able  if  and  only  if  it  is  bridge-connected. 

The  necessity  of  this  condition  is  clear,  and  it  is  not  hard  to  prove  that  it  is 
also  sufficient.  One  proof  of  this  theorem  ([Rober])  is  obtained  by  showing  that  if  G 
is  bridge-connected  then  the  following  yields  a  strong  orientation:  let  T  be  a 
depth-first  search  tree  of  G;  Orient  all  edges  of  T  from  a  vertex  to  its  child  in  T, 
and  all  other  edges  from  a  vertex  to  its  ancestor.  A  nice  consequence  of  this  proof  is 
that  it  provides  a  linear-time  sequential  algorithm  for  constructing  a  strong  orien¬ 
tation  of  an  undirected  graph. 

The  solution  for  mixed  graphs  was  given  by  Boesch  and  Tindell  41  years  later, 
yet  is  very  similar  to  Robbins  theorem. 

Theorem  4.2  ([BT]):  A  mixed  gTaph,  G,  is  strongly  orientable  if  and  only  if  it  is 
connected  and  its  underlying  undirected  graph  is  bridge-connected. 

Again,  the  conditions  are  clearly  necessary,  and  the  proof  that  they  are 
sufficient  is  not  complicated.  One  way  to  view  this  theorem  is  as  follows:  consider 
the  underlying  undirected  graph,  H,  of  G.  Assume  that  we  were  given  this  graph 
initially,  and  have  oriented  some  of  its  edges  to  obtain  G.  Then  we  can  complete 
this  orientation  into  a  strong  orientation  if  H  was  strongly  orientable  in  the  first 
place  and  if,  in  transforming  H  into  G,  we  have  not  ruined  connectedness. 

This  theorem  gives  rise  to  the  following  obvious  sequential  algorithm:  orient 
the  edges  one  by  one.  At  each  step  assign  the  edge  an  orientation  that  maintains 
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connectedness  of  the  gTaph.  If  no  such  orientation  exists,  the  graph  is  not  strongly 
orientable.  Otherwise,  the  resulting  digraph  will  be  strongly  connected.  In  [BT]  it 
is  remarked  that  an  algorithm  based  on  depth-first  search  exists  for  mixed  graphs 
too.  Indeed,  such  an  algorithm  (which  achieves  linear  running  time)  appears  in 
[CGT], 

4.2.2  Parallel  Algorithms  for  Undirected  Strong  Orientation 

As  indicated  above,  both  variants  of  the  strong  orientation  problem  are 
efficiently  solvable  sequentially.  Several  papers  have  appeared  about  parallel  algo¬ 
rithms  for  strong  orientation  of  undirected  graphs.  Atallah  [Atal]  presented  the 
problem  as  one  of  interest  for  parallel  computation,  and  gave  an  O(log'n)  solution 
using  0(n3)  processors  on  a  CREW  PRAM  ((O(logn))  on  a  CRCW  PRAM).  Two 
later  papers  give  algorithms  based  on  Atallah’s  method,  and  are  aimed  at  reducing 
the  number  of  processors.  Tsin’s  algorithm  [Tsin]  runs  in  time  0(log2n)  using 
0(n2  /  log2/i)  processors  on  a  CREW  PRAM.  Note  that  this  gives  an  optimal 
processor-time  product  in  the  case  of  dense  graphs.  Vishkin  [Vish2]  gives  two 
implementations,  one  that  has  the  same  complexity  as  that  of  Tsin,  and  one  that 
runs  in  time  O(logn)  using  n  +  m  processors  on  a  CRCW  PRAM  (where  m  is  the 
number  of  edges).  A  simplified  algorithm  with  the  same  complexity  as  Vishkin’s 
algorithm  appears  in  [TV], 

We  give  a  brief  high-level  description  of  Atallah’s  algorithm,  which  is  imple¬ 
mented  in  [Tsin]  and  [Vish2]  as  well.  The  first  idea  is  to  use  an  arbitrary  spanning 
tree,  T,  as  opposed  to  the  depth-first  spanning  tree  used  in  the  sequential  algo¬ 
rithm.  Each  edge  not  in  T  induces  a  fundamental  cycle  (which  is  the  unique  cycle 
consisting  of  that  edge  together  with  a  subset  of  the  edges  of  T).  The  observation  is 
that  if  we  orient  its  edges  of  each  fundamental  cycle  in  a  consistent  way  along  the 
cycle  then  the  resulting  orientation  is  strong.  A  problem  might  arise  in  that  a  tree 
edge  might  be  contained  in  several  fundamental  cycles,  and  might  receive 
conflicting  orientations.  For  this  end  the  idea  of  assigning  priorities  to  cycles  is 
used.  Each  fundamental  cycle  is  given  a  distinct  priority,  and  the  orientation 
assigned  to  an  edge  is  according  to  the  fundamental  cycle  of  highest  priority  that 
contains  it.  It  is  not  hard  to  see  that  all  edges  can  be  assigned  orientations  in 
parallel,  and  an  NC  algorithm  follows. 
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When  the  input  graph  is  mixed,  it  is  clear  that  we  need  a  different  method. 
The  basic  difference  is  that  in  the  mixed  case  some  of  the  edge  orientations  have 
been  determined,  so  the  way  in  which  we  can  orient  the  rest  of  the  edges  is  con¬ 
strained.  Furthermore,  since  U(G)  is  not  necessarily  connected,  there  is  no  obvi¬ 
ous  analog  to  Atallah’s  algorithm  that  works  for  mixed  graphs. 


4.3  Strong  Orientation  of  Mixed  Graphs 

Our  algorithm  works  in  three  stages,  after  checking  that  the  input  graph  is 
strongly  orientable.  This  checking  involves  testing  if  the  underlying  undirected 
graph  of  G  is  bridge-connected  and  the  underlying  directed  graph  of  G  is  strongly 
connected  (the  conditions  of  theorem  4.2). 

In  the  first  stage  we  orient  a  subset  of  the  undirected  edges  according  to  direc¬ 
tions  imposed  by  existing  directed  arcs.  If,  for  some  arc,  we  find  a  cycle  in  which  it 
lies,  and  orient  the  edges  contained  in  that  cycle  in  a  consistent  fashion  along  the 
cycle,  then  its  endpoints  will  lie  in  the  same  strongly  connected  component.  We 
want  to  do  this  in  parallel  for  many  arcs.  For  this  we  use  the  idea  of  assigning 
priorities  mentioned  above  ([Atal]).  In  our  case  priorities  are  given  to  arcs  of  G.  In 
order  to  gain  efficiency,  we  do  not  apply  this  operation  to  all  the  arcs  of  G,  but  only 
to  a  "spanning  forest"  of  D(G),  which  we  prove  to  be  sufficient  for  our  purposes. 
After  completing  this  stage,  the  resulting  mixed  graph,  G  ,  is  such  that  all  directed 
arcs  lie  within  strongly  connected  components. 

In  the  second  stage  we  orient  undirected  edges  going  between  strongly  con¬ 
nected  components  of  D(G’)  to  obtain  the  graph  G" .  In  this  stage  the  only  use  of 
the  directed  arcs  is  in  determining  the  strongly  connected  components.  Our  main 
theorem  is  that  the  graph  D{G ")  (the  directed  subgraph  of  G  )  is  strongly  con¬ 
nected. 

In  the  third  stage  we  simply  assign  an  arbitrary  orientation  to  any  edge  in 
G"  that  is  still  undirected. 

The  algorithm  is  listed  formally,  followed  by  its  complexity  analysis  and  proof 


of  correctness. 
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procedure  STRONG— ORIENT ATION (G ) 

(0)  Check  that  the  two  conditions  of  theorem  4.2  for  strong  orientability  hold 
for  G.  If  not,  abort. 

(1) 

(1.1)  Find  a  spanning  forest  of  the  underlying  undirected  graph  of  D(G). 
Call  this  set  of  arcs  F. 

(1.2)  In  parallel  assign  distinct  integer  priorities  to  all  arcs  of  F:  priority 
f{a)  for  arc  a. 

(1.3)  For  all  arcs,  a€F,  do  in  parallel:  let  a=(utv).  Find  a  simple  path, 
pa,  from  v  to  u  in  G.  Assign  each  undirected  edge  in  pa  a  temporary 
orientation  with  priority  f{a)  according  to  the  direction  of  pa.  ((Note, 
edges  with  temporary  orientations  are  still  considered  undirected.)) 

(1.4)  For  all  undirected  edges,  e,  do  in  parallel:  if  e  has  at  least  one  tem¬ 
porary  orientation,  orient  it  according  to  the  temporary  orientation  of 
highest  priority. 

Call  the  resulting  graph  G' . 

(2)  Construct  the  undirected  multigraph  H  =  U{S(G')),  (whose  vertices  are 
strongly  connected  components  of  D(G')  and  whose  edges  are  those  going 
between  strongly  connected  components). 

Orient  the  edges  of  H  by  some  strong  orientation  algorithm  for  undirected 
graphs  (e.g  [Vish2]).  Call  the  resulting  mixed  graph  G" . 

(3)  In  parallel  assign  arbitrary  orientations  to  all  undirected  edges  of  G  . 
end  STRONG-ORIENTATION 

Complexity  analysis  and  implementation  details:  We  claim  that  our  algorithm 
can  be  implemented  to  run  in  time  O(logn)  using  0{M{n))  processors  on  a  CRCW 
PRAM.  (Recall  that  M(n)  is  the  number  of  processors  required  to  multiply  two 
matrices,  or  to  compute  the  transitive  closure  of  a  matrix.)  We  indicate  how  each  of 
the  steps  can  be  implemented  with  these  resources. 

Step  (0):  Testing  if  the  underlying  graph  is  bridge-connected  can  be  done  by  finding 
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biconnected  components  [TV],  Testing  if  the  graph  is  connected  can  be  done  by 
transitive  closure. 

Step  (1.1):  Finding  a  spanning  forest  can  be  done  in  time  O(logn)  using  0(n  +  m) 
processors  (where  n  is  the  number  of  vertices  of  G  and  m  is  the  total  number  of 
edges  and  arcs)  by  a  modification  of  [S V]  indicated  in  [TV], 

Step  (1.2):  The  arc  (tj)  can  be  given  the  priority  im+j.  It  is  easy  to  see  that  no 
two  arcs  will  get  the  same  priority. 

Step  (1.3):  The  set  of  paths,  pa,  can'  be  found  in  the  following  way  -  construct 
matrices  P(0),P(1) . P  s>  where  s  =[log2nj  as  follows: 

j  i {  (iJ)€A{G)  or  {iJ}€E{G) 

p<0)r:  . 

r  |0  otherwise 

P(r-1)[ij]  if  P(r_li[i  j]*  0 

k  if  P,r-1)[ij]  =  0,  P(r_1)[i,fc]*0  and  P<r_1)[fcj]*0  for  some  k 

0  otherwise 

The  meaning  of  these  matrices  is  -  if  Pir)[iJ]  =  k  then  k  is  a  vertex  lying  on  a 
path  from  vertex  i  to  vertex  j  such  that  the  distances  from  i  to  k  and  from  k  to  j 
are  both  no  more  than  2r .  Note  that  P(r>  can  be  computed  from  P,r_1)  using 
0(M{n))  processors  in  constant  time  on  a  common  write  PRAM.  Now,  a  path  from 
i  to  j  can  be  reconstructed  as  follows  -  create  an  array  A[0:d],  where  d  =  25  +  1  (the 
smallest  power  of  two  strictly  larger  than  n).  Initially  set  A[0]  =  t,  A[d]=j  and  all 
other  entries  to  zero.  The  array  will  be  filled  in  s  +  1  steps  (steps  0,1,. ..,s):  in  step  r, 
if  A[x]*0,  A[y]*0  and  for  all  z,  x<z<y,  A[z]=0,  then  set 
A[(x+y)/2]  =  P(5_r)[A[x]^[y]].  In  each  step,  all  entries  for  that  step  can  be  filled 
in  parallel  in  constant  time,  so  the  array  can  be  filled  in  time  O(logn)  with  0(n ) 
processors.  Reading  the  array  from  index  0  to  index  d  gives  the  path  from  i  to  j 
with  possible  repetitions  of  vertices  (i.e.  each  vertex  of  the  path  appears  in  a  con¬ 
secutive  set  of  indices  of  the  array). 

Step  (1.4):  This  can  be  done  in  constant  time  on  a  PRIORITY  CRCW  PRAM  using 
0(n2)  processors  -  have  a  processor  for  each  index  of  each  of  the  arrays  generated 


Pir'[iJ  1  = 
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in  step  (1.3).  Say  processor  p  is  in  charge  of  index  l  of  the  array  A,  corresponding 
lo  the  path  from  i  to  j.p  checks  if  A[l]  *A[l  + 1].  If  so,  and  if  {A[Z],A[Z  + 1]}  is  an 
undirected  edge  of  the  graph,  then  p  writes  the  direction  (A[Z],A[Z  +  1])  into  the 
memory  cell  designated  for  holding  the  orientation  of  the  edge  {A  [l],A  [l  + 1]}.  This 
can  be  simulated  in  time  O(logn)  with  the  same  number  of  processors  by  any  other 
PRAM  model  using  standard  simulations. 

Steps  (2)  and  (3):  After  constructing  U(S(G'))  (transitive  closure),  Vishkin’s  algo¬ 
rithm  ([ Vi sh2])  can  be  used  to  compute  the  orientation.  Step  (3)  is  clearly  easy. 

Next  we  prove  the  correctness  of  the  algorithm.  For  simplicity  of  presentation 
we  assume  that  the  the  input  graph,  G,  is  strongly  orientable  (i.e.  that  it  passes 
the  tests  of  steps  (0)'. 

Lemma  4.1:  Let  e  =  {u,v}  be  an  undirected  edge,  which  becomes  oriented  at  stage 
1  of  the  algorithm.  Then  u  and  v  belong  to  the  same  strongly  connected  component 
of  D(G'). 

Proof:  It  is  helpful  to  view  stage  1  as  happening  in  phases:  first  orient  all  edges 
with  temporary  orientation  of  the  highest  priority;  next  orient  all  unoriented  edged 
with  second  highest  priority,  and  so  on.  We  proceed  to  prove  the  claim  stated  in 
the  lemma  by  induction  on  f(a),  the  temporary  orientation  with  highest  priority 
that  e  receives  in  step  1.2. 

The  base  case  is  when  f(a)  is  the  highest  priority  over  all  arcs  of  G.  In  this 
case  all  edges  of  pa  are  oriented  in  a  consistent  fashion  along  some  cycle,  so  follow¬ 
ing  this  u  and  v  lie  on  a  directed  cycle,  thus  in  the  same  strongly  connected  com¬ 
ponent. 

For  the  induction  step  assume  that  the  claim  holds  for  all  edges  with  tem¬ 
porary  orientation  of  higher  priority  than  f{a).  If  all  edges  in  pa  are  oriented  to 
form  a  cycle  with  a  then,  again,  u  and  v  will  lie  on  a  cycle.  Assume  some  edges, 
gi  =  {wt,xj,  in  pa  have  been  oriented  in  previous  phases  counter  to  the  direction  of 
pa.  By  the  induction  hypothesis,  for  all  i,  wl  and  x,  are  mutually  reachable  via 
directed  paths.  Thus  after  the  current  phase  there  will  be  directed  paths  from  v  to 
u  and  from  u  to  v.  Q 
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Lemma  4.2:  Let  a  =  (u,v)  be  an  arc  in  the  spanning  forest,  F ,  computed  in  step 
(1.1).  Then  u  and  v  belong  to  the  same  strongly  connected  component  of  D{G'). 
Proof:  Similar  to  the  proof  of  lemma  4.1.  0 

Lemma  4.3:  The  digraph  D{G')  is  either  strongly  connected,  or  consists  of  several 
isolated  strongly  connected  components. 

Proof:  Let  a=(u,v)  be  an  arc  of  G' .  If  a  was  an  arc  of  G  then,  u  and  v  were  con¬ 
nected  by  some  (not  necessarily  directed)  path  in  the  spanning  forest,  F,  found  in 
step  (1.1).  The  fact  that  a  lies  inside  a  strongly  connected  component  of  D(G')  fol¬ 
lows  from  repeated  application  of  lemma  4.2.  If  a  was  not  an  arc  of  G,  it  must 
have  been  an  edge  that  was  oriented  in  stage  1,  and  by  lemma  4.1  lies  inside  a 
strongly  connected  component  of  D(G’).  [] 

Lemma  4.4:  The  mixed  graph,  G',  is  strongly  orientable. 

Proof:  The  second  condition  of  theorem  4.2  clearly  holds:  the  underlying 
undirected  graph  of  G'  is  the  same  as  that  of  G,  and  thus  has  no  bridges.  Assume 
the  first  condition  is  violated,  i.e.  there  exist  a  pair  of  vertices,  u  and  v,  such  that 
there  is  a  (mixed)  path,  p,  from  u  to  v  in  G,  but  not  in  G'.  Let  x  be  the  last  vertex 
in  p  that  is  reachable  from  u  in  G',  and  let  y  be  the  first  nonreachable  vertex.  It 
follows  that  {x,y}  is  an  undirected  edge  in  G  that  was  oriented  in  stage  1  to  become 
(yyc)  in  G'.  But  by  lemma  4.1  there  is  a  (directed)  path  from  i  toy  in  G'.  A  con¬ 
tradiction.  0 

Lemma  4.5:  The  undirected  multigraph,  U(S(G'))  (whose  vertices  are  the  strongly- 
connected  components  of  D{G')  and  edges  are  undirected  edges  between  com¬ 
ponents)  is  strongly  orientable. 

Proof:  By  lemma  4.3,  the  only  edges  between  strongly  connected  components  of  G' 
are  undirected.  Thus  C7(S(G'))  =  S(G').  By  lemma  4.4,  G’  is  strongly  orientable, 
so,  clearly,  S(G')  must  also  be  strongly  orientable.  □ 

Theorem  4.3:  The  directed  graph,  D{G")  is  strongly  connected. 

Proof:  By  lemma  4.5,  stage  2  of  the  algorithm  yields  a  strong  orientation  of  S(G  ). 
Thus  in  the  resulting  directed  graph,  D{G"),  every  vertex  is  reachable  from  any 


other.  □ 
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4.4  Minimum  Strong  Augmentation  of  Graphs  and  Digraphs 

The  problems  we  consider  in  this  section  are:  find  a  minimum  augmentation 
to  make  an  undirected  graph  bridge-connected  and  to  make  a  digraph  strongly  con¬ 
nected.  Linear  time  sequential  algorithms  for  both  these  problems  appear  in  [ET]. 
Our  parallel  solution  for  the  undirected  case  is  a  fairly  straightforward  paralleliza¬ 
tion  of  that  in  [ET],  Our  solution  in  the  directed  case  is  quite  different. 

4.4.1  Making  a  Graph  Bridge-Connected 

The  algorithm  for  making  an  undirected  graph,  G,  bridge-connected  is  essen¬ 
tially  that  of  [ET].  The  purpose  of  this  section  is  to  show  that  this  algorithm  has  a 
fast  and  efficient  parallel  implementation  and  to  obtain  some  insight  towards  solv¬ 
ing  the  problem  for  mixed  graphs.  The  proof  of  correctness  of  the  algorithm  below 
appears  in  [ET],  and  will  not  be  repeated  here. 

procedure  UNDIRECTED-AUGMENTATION {G) 

(0)  Set  A  =0.  ((A  is  the  augmenting  set  of  edges.)) 

(1)  Construct,  BC{G),  the  bridge-component  graph  of  G:  vertices  are  bridge- 
connected  components  of  G  and  edges  are  edges  between  bridge-connected 
components.  ((Note  that  BC(G )  is  a  forest.)) 

(2)  Let  Tl,...,Tk  be  the  connected  components  of  BC{G).  Let  L2i-i  anc*  be 
distinct  leaves  of  T,  for  all  i  (or  the  same  leaf  if  71,  is  an  isolated  vertex).  In 
parallel,  for  all  i<k,  add  to  A  an  edge  from  some  vertex  of  the  bridge- 
connected  component  L2;  to  some  vertex  ofL2,-^i. 

(3)  Let  G'  be  the  augmented  graph  after  step  (2).  Note  that  BC(G')  is  a  tree. 
Pick  some  non-leaf  vertex,  r,  of  BC(G')  and  root  BC(G')  at  r. 

(4)  Number  the  vertices  of  BC(G')  in  preorder. 

(5)  Let  L(l)rL(2),...,L(m)  be  the  leaves  of  BC(G')  in  increasing  preorder 
number.  For  all  i<fm/2l  do  in  parallel:  add  to  A  an  edge  from  some  vertex  of 
the  bridge-connected  component  L(i)  to  some  vertex  of  L(i  +[m/2|). 

(6)  Return  A. 


52 


end  UNDIRECTED -AUGMENTATION 

Complexity  analysis  and  implementation  details:  We  indicate  how  the  pro¬ 
cedure  can  be  implemented  to  run  in  time  O(logn)  using  0(n  +  m)  processors  on  a 
CRCW  PRAM,  where  the  input  graph,  G,  has  n  vertices  and  m  edges:  finding  the 
bridge-connected  components  of  G  can  be  done  by  finding  biconnected  components 
([TV]).  This  is  because  the  bridges  of  G  are  exactly  the  blocks  containing  a  single 
edge.  The  bridge-components  are  the  connected  components  of  G  without  its 
bridges.  Rooting  a  tree  and  finding  a  preorder  numbering  can  be  done  using  the 
"Euler  tour"  technique  of  [TV]  within  the  stated  resource  bounds. 


4.4.2  Making  a  Digraph  Strongly  Connected 

Let  G  be  a  digraph.  A  source  of  G  is  a  vertex  with  no  incoming  arcs.  A  sink, 
of  G  is  a  vertex  with  no  outgoing  arcs.  Note  that  according  to  our  definitions  an 
isolated  vertex  is  both  a  source  and  a  sink.  Recall  that  S(G)  is  the  acyclic  digraph 
of  the  strongly  connected  components  of  G.  An  augmentation  of  S(G)  corresponds 
to  an  augmentation  of  G  as  follows:  an  arc  from  Cx  to  C2  in  S(G)  corresponds  to  an 
arc  from  some  vertex  of  the  component  Cx  to  some  vertex  of  the  component  C2  in 
G.  The  following  easily  proven  lemmas  appear  in  [ET]. 


Lemma  4.6:  An  augmentation  makes  S(G)  strongly  connected  if  and  only  if  the 
corresponding  augmentation  makes  G  strongly  connected. 


Lemma  4.7:  A  lower  bound  on  the  number  of  arcs  needed  to  make  G  strongly  con' 
nected  is: 


$a(G)  — 


0  if  G  is  strongly  connected 

max  { number  of  sources  of  S(G),  number  of  sinks  of  S{G)} 


otherwise 


Our  algorithm  finds  an  augmenting  set  of  arcs  of  size  sa{G),  which  strongly 
connects  S(G).  By  lemmas  4.6  and  4.7  this  augmentation  corresponds  to  a 
minimum  augmentation  of  G.  The  following  lemma  allows  us  to  focus  our  atten¬ 
tion  only  on  sources  and  sinks  of  S(G): 
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Lemma  4.8:  Let  A  be  an  augmentation  that  strongly  connects  all  the  sources  and 
sinks  of  S(G)  (i.e  merges  them  into  one  strongly  connected  component).  Then  A 

strongly  connects  S(G). 

Proof:  Since  S(G)  is  acyclic,  every  vertex,  v,  of  S(G)  lies  on  a  path  from  some 
source,  x,  to  some  sink,  y.  Thus,  in  the  augmented  digraph,  v  lies  on  a  closed  trail 
containing  x  and  y,  and  therefore  is  contained  in  the  same  strongly  connected  com¬ 
ponent  as  the  sources  and  sinks  of  S(G).  0 

Motivated  by  this  lemma  we  define  the  undirected  bipartite  graph 
B[G)  =  {X,Y ,E)  where: 

X  =  set  of  sources  of  S(G). 

Y  =  set  of  sinks  of  S(G). 

E  =  {{xj}  |  there  is  a  (possibly  empty)  path  from  x  to  y  in  S(G)} 

Lemma  4.9:  B{G)  has  no  isolated  vertices. 

Proof:  An  isolated  vertex  of  S(G)  appears  in  B(G)  as  a  pair  of  vertices,  x  £X  y  €  Y, 
connected  by  an  edge.  For  a  source  which  is  not  isolated  in  S(G),  there  must  be  a 
path  from  it  to  some  sink,  since  S(G)  is  finite.  Similarly,  every  non-isolated  sink  is 

reachable  from  some  source.  [] 

It  turns  out  that  the  notion  of  matching  in  the  graph  B(G )  is  helpful  in 
obtaining  a  minimum  augmentation  for  strongly  connecting  G.  A 
maximal  matching  is  a  matching  that  is  not  properly  contained  in  any  other 

matching. 

Theorem  4.4:  Let  M  =  {{x„yj,  .  .  .  ,{xk,yk}}  be  a  maximal  matching  of  B(G), 

where  x.’s  are  sources  of  S(G)  and  y,  s  are  sinks.  Let 

A  =  f(y1,x2),...,(yt-1,x*),(y*,x1);  be  an  augmentation  of  S(G),  and  let  G'  be  the 

augmented  digraph.  Then: 

(1)  sa(G')  =  sa(G)  —  k. 

(2)  In  S(G')  every  sink  is  reachable  from  every  source. 

Proof:  The  matching,  Jf,  together  with  the  augmentation.  A,  constitute  a  span¬ 
ning  cycle  of  the  vertices  jllyl,...Ixl0V  Thus  all  these  vertices  are  contained  in 
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one  strongly  connected  component  of  G' .  Call  this  component  C.  If  S{G)  has 
exactly  k  sources  and  k  sinks  (i.e  M  is  a  perfect  matching)  then  G'  is  strongly  con¬ 
nected  (by  lemma  4.8),  and  the  theorem  holds.  Otherwise,  if  x  is  a  source  not 
covered  by  M  then,  by  lemma  4.9,  x  has  some  neighbor,  y,  in  B(G).  Since  M  is 
maximal,  y  must  be  covered  by  M  (otherwise  M\J{x,y}  is  a  matching  properly  con¬ 
taining  M).  Therefore  there  is  a  path  from  x  to  C  in  G’ .  Similarly,  if  there  is  a 
sink  not  covered  by  M,  there  is  a  path  from  C  to  it  in  G' .  Both  parts  of  the 
theorem  follow.  G 

The  following  lemma  shows  the  importance  of  the  second  part  of  the  theorem: 

Lemma  4.10:  Let  G  be  an  acyclic  digraph  in  which  every  sink  is  reachable  from 
every  source,  and  let  A  be  an  augmentation  of  G  in  which  every  arc  is  from  a  sink 
to  a  source.  If  A  contains  at  least  one  arc  into  every  source  and  at  least  one  arc  out 
of  every  sink  then  A  makes  G  strongly  connected. 

Proof:  Picture  the  arcs  of  A  as  being  added  to  G  one  by  one,  in  an  arbitrary  order. 
The  conditions  of  the  lemma  imply  that  after  an  arc  is  added,  its  endpoints  lie  in 
the  same  strongly  connected  component.  Thus,  after  adding  all  the  arcs  of  A,  all 
the  sources  and  sinks  of  G  lie  in  the  same  strongly  connected  component,  and  by 
lemma  4.8,  G  becomes  strongly  connected.  0 

At  this  point  we  have  enough  ingredients  for  an  NC  algorithm  for  our  prob¬ 
lem:  given  G,  construct  B(G)  and  find  a  maximal  matching  in  it;  augment  G  with 
an  appropriate  set  of  arcs  according  to  theorem  4.4;  finally,  augment  the  resulting 
graph  in  the  way  indicated  by  lemma  4.10.  Theorem  4.4  and  lemma  4.10  imply 
that  the  augmentation  is,  indeed,  of  minimum  size. 

A  careful  look  at  the  proof  of  theorem  4.4  reveals  that  the  matching  in  the 
statement  of  the  theorem  need  not  be  maximal.  The  property  we  really  want  is 
that  every  unmatched  vertex  has  a  matched  neighbor.  Let  a  dense  matching _  be  a 
matching  for  which  every  non-isolated  vertex  is  is  adjacent  to  a  matched  vertex. 

Then  we  have: 

Corollary  4.1:  Let  M  be  a  dense  matching  of  B(G).  Then  the  statement  of  theorem 
4.4  holds  for  M. 
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The  distinction  between  a  maximal  matching  and  a  dense  matching  is  impor¬ 
tant  for  us,  since  we  can  find  a  dense  matching  in  O(logn)  parallel  time,  whereas 
the  fastest  processor-efficient  algorithm  known  for  finding  a  maximal  matching 
runs  in  time  0(log3n)  ([IS]).  The  basis  for  a  fast  and  efficient  algorithm  is  the  fol¬ 
lowing  lemma: 

Lemma  4.6:  Let  F  be  a  spanning  forest  of  a  graph,  G.  Then  a  dense  matching  of  F 
is  also  a  dense  matching  of  G. 

Proof:  Let  v  be  a  non-isolated  vertex  of  G.  Then  v  is  non-isolated  in  F.  [] 


Our  parallel  algorithm  follows. 

procedure  DIRECTED _A UGMENT  ATION (G) 

(0)  Set  A  =  0.  ((A  is  the  augmenting  set  of  arcs.)) 

(1)  Find  the  strongly  connected  components  of  G  and  construct  B(G). 

(2)  Find  M  =  DEN  SE -MAT  CHIN  G{B{G)). 

(4)  Let  {{x 0,y rj,  .  .  .  be  the  edges  of  M.  Put  in  A  an  arc  from  a 

vertex  of  the  strongly  connected  component  corresponding  to  y,  to  a  vertex  of 
X(i  +  Dmodk  for  all  i,  0 <i<k. 

(5)  Let  a0,...,ah-1  be  the  sources  uncovered  by  M  and  ba,...,bl-l  be  the  sinks 
uncovered  by  M.  Let  m  =  min {h,l).  Add  to  A  an  arc  from  a  vertex  of  the 
strongly  connected  component  corresponding  to  mod  m  to  a  vertex  of  a,  moc[  m 
for  all  i. 

(6)  Return  A . 

end  DIRECTED  M  UGMENT  ATION 

procedure  DENSE-MATCHING{G ) 

((  Returns  a  dense  matching  of  an  undirected  graph,  G  )) 

(0)  SetM  =  0.  ((M  is  the  matching.)) 

(1)  Find  a  spanning  forest,  F,  of  G.  The  rest  of  the  procedure  is  performed  on 
all  connected  components  of  F  in  parallel. 

(2)  Let  T  be  a  connected  component  of  F.  Pick  an  arbitrary  vertex,  r,  of  T , 
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and  make  T  rooted  at  r . 

(3)  For  all  vertices,  v,  of  T  in  parallel,  compute  the  distance  of  v  from  the  root 

r. 

(4)  For  all  non-leaf  vertices,  u,  of  even  distance  from  r  do  in  parallel:  pick  one 
of  v’s  children,  u ,  and  add  {u,uj  to  M . 

(5)  Krall  vertices,  v,  of  odd  distance  from  r  do  in  parallel:  if  v  is  unmatched 
and  it  has  an  unmatched  child  then  pick  such  a  child,  u,  and  add  { u,v }  to  M. 

(6)  Return  M. 

end  DENSE -MATCHING 

Lemma  4.12:  The  set  of  edges,  M,  computed  by  procedure  DENSE-MATCHING  is 
a  dense  matching  in  the  given  graph,  G. 

Proof:  That  M  is  a  matching  follows  from  the  fact  that  the  parent  in  a  tree  is 
unique.  By  lemma  4.11,  we  need  only  show  that  M  is  dense  in  every  component  of 
the  spanning  forest,  F ,  of  G.  If  v  is  a  vertex  of  odd  distance  from  the  root  of  its 
component  then  its  parent  is  matched  at  step  (4).  If  v  is  a  non-leaf  vertex  of  even 
distance  from  the  root  then  it  is  matched  to  one  of  its  children  in  step  (4).  If  it  is  a 
leaf  of  even  distance  then  its  parent  is  matched  either  in  step  (4)  or  in  step  (5).  [] 

Complexity  analysis  and  implementation  details:  First  we  show  that 
DEN SE -MAT CHIN G(G )  runs  in  time  O(logn)  using  0{n  +  m )  processors  on  a 
CRCW  PRAM,  where  n  is  the  number  of  vertices  of  G  and  m  is  the  number  of 
edges:  A  spanning  forest,  F,  can  be  computed  with  these  bounds  by  a  modification 
of  the  Shiloach-Vishkin  algorithm  ([SV])  indicated  in  [TV].  Rooting  a  tree  at  a 
vertex  can  be  done  using  the  "Euler  tour"  technique  of  [TV].  Finding  distances 
from  the  root  is  performed  by  a  standard  "doubling"  technique  -  [logn]  stages, 
where  at  stage  k  every  vertex  points  to  its  ancestor  of  distance  2  (or  to  the  root,  if 
it  is  closer  than  2*).  Steps  (4)  and  (5)  are  very  local  in  nature,  and  can  clearly  be 
executed  with  the  stated  resources  (by  having  vertices  try  to  match-up  with  their 
parents,  and  solving  conflicts  by  the  concurrent  write  feature). 

Finally  we  observe  that  the  only  expensive  step  in 
DIRECTED -AUGMENTATION  is  step  (1),  which  can  be  done  by  a  transitive  clo- 
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sure  computation,  in  time  O(logn)  using  0{M(n))  processors. 


4.5  Minimum  Strong  Augmentation  of  Mixed  Graphs 

In  this  section  we  give  an  ATC  algorithm  for  the  problem  of  finding  a 
minimum  augmenting  set  of  arcs  for  a  given  mixed  graph,  G,  such  that  the  result¬ 
ing  mixed  graph  is  strongly  orientable.  By  theorem  4.2  we  can  phrase  the  problem 
as: 

First  formulation:  Augment  G  with  a  minimum  set  of  arcs  such  that: 

(1)  The  underlying  directed  graph  of  the  augmented  graph  is  strongly  con¬ 
nected. 

(2)  The  underlying  undirected  graph  of  the  augmented  graph  is  bridge- 
connected. 

From  the  previous  section  we  know  that  in  order  for  an  augmentation  to  accom¬ 
plish  (1)  it  needs  to  contain  arcs  incident  to  all  sources  and  sinks  of  the  strong 
component  graph  of  the  underlying  directed  graph  of  G  (to  be  called  "super 
sources"  and  "super  sinks"  below).  In  order  for  an  augmentation  to  accomplish  (2) 
it  needs  to  contain  arcs  incident  to  all  leaves  of  the  bridge-component  graph  of  the 
underlying  undirected  graph  of  G  (to  be  called  "super  leaves  ).  A  hard  aspect  of 
the  problem  in  this  formulation  is  that  super  sources  and  sinks  can  intersect  super 
leaves  in  many  ways.  One  can  find  a  minimum  set  of  vertices  hitting  all  the  super 
sources,  sinks  and  leaves,  but  it  is  not  clear  how  to  proceed  after  finding  such  a  set. 

A  different  formulation  leads  to  our  solution.  Recall  that,  for  a  digraph,  D, 
sa(D )  is  the  maximum  of  the  number  of  sources  of  S(D )  and  the  number  of  sinks  of 
S{D )  (or  zero,  if  D  is  strongly  connected). 

Second  formulation:  Orient  the  edges  of  a  given  mixed  graph,  G,  such  that  for 
the  resulting  digraph,  D,  sa(D)  is  minimized. 

Lemma  4.13:  The  second  formulation  yields  a  minimum  strong  augmentation  for 

G. 

Proof:  We  can  view  our  task  as  augmenting  G  to  become  strongly  orientable  and 
then  orienting  its  edges  to  obtain  a  strongly  connected  digraph.  If  we  switch  the 
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order  of  the  operations  -  first  orient  to  obtain  D  and  then  augment  D,  we  know 
that  the  size  of  the  required  augmentation  is  exactly  sa{D).  Thus  a  minimum  aug¬ 
mentation  is  one  for  which  sa{D )  is  minimized.  [] 

The  first  step  of  our  algorithm  is  to  orient  the  "strongly  orientable"  edges  of 
G.  We  say  that  e  —  {u,v }  is  strongly  orientable  if  there  is  a  path  from  u  to  u  or  from 
v  to  u  in  G  -{ej.  We  can  orient  these  edges  by  applying  to  G  the  strong  orientation 
algorithm  of  section  3  with  the  simple  modification  of  skipping  steps  (0)  and  (3). 
After  this  step  the  endpoints  of  each  strongly  orientable  edge  lie  on  a  directed 
cycle,  so  the  partial  orientation  obtained  is  clearly  optimal.  The  proof  of  correctness 
of  this  step  is  similar  to  the  proofs  of  section  2. 

Let  G'  be  the  mixed  graph  obtained  after  the  first  step.  Let  F  =  U{S(G )) 
(recall  that  this  is  the  undirected  part  of  the  mixed  graph  whose  vertices  are  the 
strongly  connected  components  of  D{G  ),  the  directed  part  of  G  ).  Since  there  are 
no  strongly  orientable  edges  in  G'  it  follows  that  F  is  a  forest.  Furthermore, 
orienting  edges  of  F  will  not  create  new  directed  cycles,  and  thus  we  have: 

Lemma  4,14:  Let  G"  be  any  digraph  obtained  by  orienting  the  edges  of  G'.  Then 
the  strongly  connected  components  of  G"  are  the  same  as  those  of  D(G  ). 

Our  goal  is  to  find  an  orientation  that  minimizes  sa{G").  Lemma  4.14  says 
that  by  orienting  edges  of  F  we  will  not  change  the  strong  component  structure  of 
the  graph.  What  we  can  do  is  to  possibly  decrease  the  number  of  sources  and  sinks. 
For  example,  if  F  contains  an  edge  between  a  source  and  a  sink  we  can  orient  it 
from  the  sink  to  the  source,  thus  eliminating  one  source  and  one  sink. 

We  state  our  problem  in  terms  of  supply  and  demand.  Each  vertex  of  F  has 
one  of  four  possible  labels: 

I  -  a  source  vertex  (demanding  an  arc  Into  it) 

0  -  a  sink  vertex  (demanding  an  arc  Out  of  it) 

10  -  an  isolated  vertex,  which  is  both  a  source  and  a  sink 
X  -  a  vertex  with  no  demands  (neither  a  source  nor  a  sink). 

A  vertex  is  unsatisfied  if  the  orientation  does  not  provide  it  with  the  arc(s)  it 
demands.  Our  task  is  to  orient  the  edges  of  F  so  as  to  minimize 
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max  {#  of  unsatisfied  7’s,  #  of  unsatisfied  O' s}. 

Our  approach  is  the  following  -  let  T  be  a  connected  component  of  F.  We  com¬ 
pute  the  following  numbers  of  sources  and  sinks  among  vertices  of  T  remaining 
after  orienting  the  edges  of  T : 

KT)  =  the  minimum  possible  number  of  unsatisfied  sources 

0(T)  =  the  minimum  possible  number  of  unsatisfied  sinks 

t{T)  =  the  minimum  possible  total  number  of  unsatisfied  sources  and  sinks. 

These  numbers  can  be  computed  by  a  simple  case  analysis  of  the  labels  of  'ver¬ 
tices  of  T.  The  case  analysis  appears  in  the  listing  of  the  algorithm  below.  After 
calculating  i{T),  o(T )  and  t{T)  for  all  components  of  F ,  we  perform  a  simple  global 
analysis  to  decide,  for  each  component,  which  of  its  vertices  are  to  remain  sources 
and  which  are  to  remain  sinks  and  orient  its  edges  accordingly.  Finally  we  aug¬ 
ment  the  resulting  digraph  using  the  procedure  D1RECTED-AUGMENTAT10N  of 
the  previous  section. 

procedure  MIXED^A UGMENTATION (G) 

(1)  Apply  to  G  steps  (1)  and  (2)  of  the  strong  orientation  algorithm  of  section 
3.  Call  the  resulting  mixed  graph  G' 

(2)  Compute  F  =  U(S(G'))  and  label  each  vertex  of  F  by  one  of  four  labels  - 
1,0  JO  £  for  source,  sink,  isolated  vertex  and  other,  respectively. 

(3)  For  all  connected  components,  T,  of  F  do  in  parallel: 

(3.1)  Let  k  be  the  number  of  leaves  of  T  labeled  10 . 

(3.2)  If  all  vertices  of  T  are  10  then  mark  k-2  10  leaves  as  "free"  and 
set 

t(T)  =  1,  o(D  =  l,  t(T)  =  k 

(3.3)  If  all  vertices  of  T  are  I  or  10  (at  least  one  /)  then  mark  k-l  10 

leaves  as  "free"  and  set 

i(T)=  1,  o(T)  =  0,  t{T)  =  max{k,l} 

(3.4)  If  all  vertices  of  T  are  0  or  10  (at  least  one  0)  then  mark  k-l  10 

leaves  as  "free"  and  set 

i(T)  =  0,  o(T)  =  1,  t{T)=max{k,l} 
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(3,5)  In  all  other  cases  mark  all  k  10  leaves  as  "free"  and  set 

t(T)  =0,  o(T)  =  0,  t{T)  =  k 


(4)  Let 

i  =  sum  of  UTYs  of  all  connected  components,  T,  of  F. 
o  =  sum  of  o(T)’s  of  all  connected  components,  T,  of  F. 
t  =  sum  of  tlD's  of  all  connected  components,  T,  of  F. 

If  j>o  then  set  AT,  =  max/  ft/2], i  /,  and  N0  =  f-JV;. 

If  i<o  then  set  iV0  =  max/ft/2l,o  },  and  iV,  =  t-iV0. 

((Remark:  iV,  and  N0  are,  respectively,  the  number  of  sources  and  sinks  in  the 
resulting  digraph.)) 

(5)  Mark  the  first  Nt-i  "free"  vertices  of  F  as  "designated  source",  and  all  the 
other  "free"  vertices  as  "designated  sink". 

3 

(6)  For  all  connected  components,  T,  of  F  do  in  parallel: 
ORIENT  .LABELED -TREE  {T). 

(7)  Let  G"  be  the  resulting  digraph  from  the  steps  so  far.  Compute 
A  =  DIRECTED -AUGMENTATIONS"). 

(8)  Return  A . 

end  MIXED-AUGMENTATION 

procedure  ORIENT— LABELED— TREE(T) 

((Orients  a  tree  with  labels  on  its  vertices.)) 

(1)  Find  a  root,  R,  according  to  the  following  cases: 

(1.1)  If  T  has  an  X  vertex,  v,  set  R=u. 

(1.2)  Else,  if  T  has  an  /  or  "designated  sink"  vertex,  v,  and  an  0  or 
"designated  source"  vertex,  u,  then  orient  the  path,  p,  between  v  and  u 
in  the  direction  from  u  to  u,  and  set  R  =p. 

(1.3)  Else,  select  an  arbitrary  non-leaf,  u,  and  set  R  =v. 

(2)  Root  T  at  R . 

(3)  In  parallel  do  for  each  edge,  e,  which  has  not  been  oriented  in  step  (1):  if 
any  leaf  in  the  subtree  below  e  is  labeled  either  I  or  "designated  sink",  then 
orient  e  away  from  the  root,  R.  Otherwise  orient  e  towards  the  root. 
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end  ORIENT -LABELED-TREE 

Lemma  4.15:  After  orienting  a  tree,  T,  by  procedure  ORIENT-LABELED-TREE , 
the  total  number  of  remaining  sources  and  sinks  is  tiT),  and  vertices  marked 
"designated  sources"  and  "designated  sinks"  become,  respectively,  sources  and 
sinks. 

Proof:  A  simple  inductive  proof  shows  that  the  only  vertices  which  remain  as 
sources  or  sinks  after  the  orientation  are  10  leaves  and,  if  selected  at  step  (1.3), 
the  root.  R.  One  can  verify  that  the  number  of  such  vertices  is  t(T).  Furthermore, 
it  is  not  hard  to  see  that  except  for  HT)  vertices  w'hich  remain  as  sources  and  o(T) 
vertices  which  remain  as  sinks,  all  the  vertices  are  either  satisfied  or  become  what 
they  are  designated  to  be.  D 

Theorem  4.5:  The  procedure  MIXED-AUGMENTATION  computes  a  minimum 
augmenting  set  of  arcs  that  makes  a  given  mixed  graph,  G,  strongly  orientable. 
Proof:  The  discussion  before  the  algorithm  listing  proves  that  the  general  scheme 
is  correct.  We  leave  out  the  simple  yet  tedious  proof  that  the  i,  o  and  t  values 
computed  in  step  3  are  accurate.  Given  these  numbers,  it  is  clear  that  the 
minimum  total  number  of  sources  is  i,  the  minimum  total  number  of  sinks  is  o  and 
the  minimum  total  number  of  sources  and  sinks  is  f.  Thus  the  optimal  way  to 
designate  sources  and  sinks  is  to  try  to  have  no  more  than  ft/2]  sources  and  no 
more  than  ft/2]  sinks.  This  is  what  we  do  in  steps  (4)  and  (5)  within  the  bounds  set 
by  i  and  o.  Finally,  lemma  5.3  shows  that  the  numbers  of  sources  and  sinks 
remaining  after  orienting  the  edges  ofF,  are,  indeed,  the  desired  numbers.  [] 

Complexity  analysis  and  implementation  details:  First  we  show  that 
ORIENT -LABELED -TREE  runs  in  O(logn)  time  using  0(/i  +  m)  processors:  steps 
(1)  and  (2)  can  be  done  using  the  "Euler  tour"  technique  of  [TV].  Step  (3)  can  be 
implemented  by  the  tree  contraction  method  of  Miller  and  Reif  ([MR]).  Next  we 
note  that  all  steps  except  (1)  of  MIXED— AUGMENTATION  can  also  be  imple¬ 
mented  with  0{n  +  m)  processors,  since  the  most  complicated  operation  is  comput¬ 
ing  connected  components.  Finally,  the  most  expensive  step  is  step  (1),  which  was 
shown  (in  section  4.3)  to  require  O(logn)  time  and  0{M{n))  processors. 
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CHAPTER  FIVE 

ZERO-ONE  SUPPLY-DEMAND  PROBLEMS 


5.1  Introduction 

Supply-demand  problems  are  fundamental  in  combinatorial  optimization 
([FF], [Lawler]).  In  one  formulation  of  the  problem  the  input  is  a  network  in  which 
each  arc  has  a  non-negative  capacity,  and  each  vertex  has  a  certain  supply  or 
demand  (possibly  zero).  The  task  is  to  find  a  flow  function,  such  that  the  flow 
through  each  arc  is  no  more  than  its  capacity  and  the  difference  between  the  flow 
into  a  vertex  and  out  of  it  is  equal  to  its  supply  (or  demand).  This  problem  is 
equivalent  to  the  general  max  flow  problem,  and  can,  therefore,  be  solved 
efficiently  sequentially  ([Lawler], [PS], [GT]),  but  probably  has  no  efficient  parallel 
solution,  since  it  is  P-complete  ([GSS]).  There  are,  however,  many  interesting  spe¬ 
cial  cases  of  this  problem  whose  solutions  do  not  require  the  full  power  of  general 

max  flow. 

In  this  chapter  we  are  concerned  with  several  such  problems.  The  first  prob¬ 
lem  we  discuss  is:  given  a  sequence  of  supplies,  a1(  .  .  .  ,a„,  and  demands, 
bl,  .  .  .  ,bm,  construct  a  zero-one  flow  pattern  satisfying  these  constraints,  where 
every  supply  vertex  can  send  at  most  one  unit  of  flow  to  each  demand  vertex. 
Equivalently,  we  can  state  this  problem  as  that  of  constructing  a  zero-one  matrix, 
M,  having  a,  l’s  in  the  ith  row  and  bj  l’s  in  the  ;th  column  (for  all 

l<i<*  ,  l<j<m).  We  will  refer  to  this  problem  as  the 

matrix  construction  problem.  M  is  called  a  realization  for  the  input  (a, b).  There  is 
a  simple  sequential  algorithm  for  constructing  a  realization  if  one  exists 
([FF],[Gale]):  select  any  row,  assign  its  l’s  to  the  columns  having  largest  column 
sums  and  repeat  this  procedure  in  the  reduced  problem.  If  this  procedure  gets 
stuck  (i.e.  some  column  sum  becomes  negative),  then  no  realization  exists.  Note 
that  this  procedure  is  similar  to  the  one  described  in  section  3.1  for  constructing  a 
tournament  with  a  specified  degree  sequence. 

This  algorithm,  although  easy  to  implement  sequentially,  seems  very  hard  to 


63 


parallelize.  Thus  it  is  natural  to  ask  if  there  is  a  fast  parallel  algorithm  for  this 
problem.  Two  remarks  are  relevant  to  this  question:  first,  the  problem  can  be 
solved  by  network  flow  techniques.  Since  the  capacities  are  small  (polynomial  in 
the  size  of  the  flow  network),  there  are  Random  NC  algorithms  for  the  problem  by 
reduction  to  maximum  matching  ([KUW2],[MVV]).  Second,  there  is  a  simple 
sequential  method  for  testing  whether  an  instance,  (a, 6)  is  realizable 
([FF],[Berge]).  It  is  based  on  partial  sums  of  the  sequences,  and  can  be  imple¬ 
mented  in  JVC  in  a  straightforward  manner.  However,  this  method  does  not  yield 
a  way  of  constructing  a  realization.  This  is  another  example  of  the  apparent  gap 
between  search  and  decision  problems  in  the  parallel  realm  ([KUW1]). 

We  present  a  deterministic  NC  algorithm  for  the  matrix  construction  problem. 
Our  algorithm  can  be  implemented  to  run  in  time  0(log4|Mj)  using  0(|A/| -(n  +  m)) 
processors  on  a  CRCW  PRAM,  or  in  time  0(log3|M|)  using  0(|M|-(n  +  m)3)  proces¬ 
sors  on  an  EREW  PRAM,  where  M  is  the  realization  matrix  with  n  rows  and  m 
columns  and  |Af|  is  the  size  of  M  (i.e.  n-m ).  When  n  =  0(m)  the  number  of  proces¬ 
sors  is  0(|Af|15).  and  0(|Af|25)  respectively. 

The  algorithm  is  based  on  a  careful  examination  of  the  network  flow  formula¬ 
tion  of  the  problem.  It  exploits  the  fact  that  there  are  only  a  polynomial  number  of 
cuts  which  need  to  be  considered,  and  that  this  set  of  potentially  min  cuts  has  a 
natural  ordering  associated  with  it. 

The  methodology  we  develop  enables  us  to  solve  the  following  two  related 
problems  (with  the  same  time  and  processor  bounds): 

(1)  The  symmetric  supply  —  demand  problem  -  given  a  sequence  of  positive  and 
negative  integers  summing  to  zero,  representing  supplies  and  demands  respec¬ 
tively,  construct  a  zero-one  flow  pattern  so  that  the  net  flow  out  of  (into)  each  ver¬ 
tex  is  its  supply  (demand),  where  every  vertex  can  send  at  most  one  unit  of  flow  to 
every  other  vertex.  Notice  that  this  problem  is  quite  different  than  the  matrix 
construction  problem,  since  it  does  not  have  a  "bipartite  nature. 

(2)  The  digraph  construction  problem  -  construct  a  simple  directed  graph  with 
specified  in-  and  out-degrees.  This  corresponds  to  constructing  a  zero-one  matrix 
with  specified  row  and  column  sums,  where  the  diagonal  entries  are  forced  to  be 
zero.  [FF]  and  [Berge]  give  a  simple  sequential  algorithm  when  the  in-  and  out- 
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degrees  are  sorted  in  the  same  order  (i.e.  a  vertex  with  higher  in-degTee  has  higher 
out-degree).  Our  algorithm  is  the  only  one  we  know  of  for  general  orders  That 
does  not  use  max  flow. 

In  the  last  section  we  extend  our  results  to  the  case  where  the  input 
represents  upper  bounds  on  supplies  and  lower  bounds  on  demands. 

5.2  The  Matrix  Construction  Problem 
5.2.1  The  Slack  Matrix 

Our  parallel  algorithm  is  based  on  a  careful  analysis  of  the  network  flow  for¬ 
mulation  of  the  problem.  The  main  tool  we  use  is,  what  we  call,  the  slack  matrix 
which  is  similar  to  the  "structure  matrix"  of  Ryser  [Ryser].  In  order  to  define  the 
slack  matrix,  we  need  to  look  at  the  solution  to  our  problem  by  network  flow. 
Given  the  input  a2^  •  ‘  .  &i—  &2—  s  km,  we  construct  a 

flow  network,  N,  as  shown  in  fig.  5.1:  the  vertex  set  consists  of  a  source,  s,  a  sink, 
t,  vertices  u,  ,  1  SiSn  corresponding  to  rows  and  vertices  Vj  ,  1  —j  —  m  correspond¬ 
ing  to  columns.  The  arc  set  contains  three  types  of  arcs:  for  all  1  Si  Sn,  l£/' 
there  are  arcs  (s,u()  of  capacity  ai}  ( Vj,t )  of  capacity  bj  and  of  capacity  1. 


Fig.  5.1  :  Flow  network  for  solving  the  0-1  matrix  construction  problem 

Let  S  =  2..  -  Clearly  the  max  flow  value  in  N  is  bounded  by  S.  Furth- 

.=i  j= i 

ermore,  a  flow  which  satisfies  all  rows  and  columns  sums  is  of  value  S.  It  follows 
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(by  the  max  flow  -  min  cut  theorem)  that  the  problem  instance  (a, b)  is  realizable  if 
and  only  if  every  directed  cut  in  N  has  capacity  at  least  S. 

Let  C  =  (C3:C‘)  be  a  directed  cut  in  N  (i.e.  the  vertices  are  partitioned  into 
two  sets,  C5,  Cl  such  that  s€Cs,  tiC‘).  Say  Cs  contains  x  vertices  from  the  set 
{uh  .  .  .  ,un}  and  m-y  vertices  from  {l’1;  .  .  .  ,vm}.  Observe  that  if  we  replace  Uj  by 
u,  in  Cs,  for  some  i<j,  then  the  capacity  of  the  cut  can  only  decrease.  Similarly, 
replacing  vk  by  17  in  C *  can  only  decrease  the  capacity  of  the  cut,  for  l>k.  It  fol¬ 
lows  that  the  capacity  of  C  is  no  less  than  the  capacity  of  the  cut  Cx  y,  where 
Ct’v  =  (s}\J{ul,...,uJ{J{uy^l,...,vJ.  Thus  there  are  only  n-m  cuts, 
[Cz  |  1  <;r<n  ,  l<vSm(,  which  are  potential  min  cuts.  The  cut  Cxy  is  shown  in 
fig.  5.2.  Therefore,  necessary  and  sufficient  conditions  for  the  instance  (a,b)  to  be 
realizable  are  that  for  every  1  <1  ^ n  1  <y  ^  m: 

capacity  (Cxy)  =  ^  a{  + 

i=x  +  l  j=y+l 

^  at  +  (S-  ^ bj)  +  x-y  ^  S 

i - x + 1  j= 1 

S  a,  -  J&y  +  x-y  S  0 
1=1+1  j= 1 

Definition:  The  slack  of  Cxy  of  problem  instance  [a,b)  is: 

S  a,  -  %bj  +  x-y 
1=1+1  j —l 

The  slack  matrix ,  SL~p,  is  the  matrix  whose  ijth  entry  is  sl-^ij). 

Proposition  5.1:  The  instance  (a,b)  is  realizable  if  and  only  if  SL-g*  is  non¬ 
negative. 

Proposition  5.2:  Let  {a, iT)  be  an  instance  which  is  realizable  by  some  matrix,  M, 
and  assume  that  sl~j^x,y)  =  0.  Then: 

(1)  —  l  for  all  1  <x  ,  l<j  £y 

(2)  Af[ij]  =  0  for  all  i  +  l<i<n  j  +  lSjSm 
Proof:  Since  sl~g(x,y)  =  0,  the  cut  Cxy  has  capacity  S,  which  means  that  in  any 
max  flow  forward  arcs  (1)  are  all  saturated,  and  backward  arcs  (2)  all  have  zero 
flow.  This  situation  is  shown  in  fig.  5.2.  0 


S  hj  +  x~y  -  S 
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Fig.  5.2  :  A  tight  cut  -  =  0 


All  forward  arcs  are  saturated;  All  backward  arcs  have  flow  0 


If  slsS(x,y )  =  0,  We  will  call  CI0,  a  hg/if  cut.  Proposition  5.2  shows  that  existence 
of  a  tight  cut  simplifies  the  solution  considerably.  In  fact  it  gives  rise  to  a  divide 
and  conquer  approach:  if  Cx  y  is  tight,  constructing  a  matrix  Af[l:n,l:m]  for  the  ori¬ 
ginal  problem  is  reduced  to  constructing  the  two  sub-matrices,  M[x  +  l:n,l:y]  and 
M[l:x,y  +  l:m].  Of  course,  we  are  not  always  lucky  enough  to  have  a  tight  cut. 
Our  approach  is  to  perturb  the  input  so  as  to  improve  our  luck!  Here  is  a  high- 
level  description  of  our  algorithm: 

(1)  Perturb  the  inputs,  (a,  S).  Call  this  new  instance  (ct,  j?). 

(2)  Recursively  solve  the  instance  (ct,  /?).  Call  the  solution  M  . 

(3)  Correct  the  matrix  M'  to  obtain  a  matrix,  M,  which  solves  the  original 
instance,  (a,  b). 

How  do  we  perturb  an  instance?  A  basic  perturbation  can  be  viewed  as  shifting  one 
unit  from  the  poor  to  the  rich  in  order  to  make  the  situation  tighter:  subtract  1 
from  ak  and  add  1  to  at  for  some  k>l.  We  do  not  allow  that  a  perturbation  will 
change  the  ordering  of  the  a,’s,  so  it  is  necessary  that  ak>ak^1  and  a;  <<!/_!  before 
the  perturbation. 

Remark:  We  will  be  discussing  only  perturbations  of  the  row  sums  (the  a,’s).  All 
this  discussion  holds  for  perturbation  of  the  column  sums  as  well. 
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Proposition  5.3:  Let  (a,b)  be  a  problem  instance,  and  let  (3$)  be  obtained  by 
shifting  one  unit  from  ak  to  a,  for  some  k>l.  Then  s =  sl-j(x,y)- 1  if 
/<x< k,  and  sl^/x,y)  =  sl-j(x,y)  otherwise. 

Proof:  This  can  be  seen  by  looking  at  the  formula  for  si.  0 

This  proposition  shows  that  a  basic  perturbation  reduces  the  slack  of  a  certain 
set  of  cuts,  and  leaves  the  rest  unchanged.  This  observation  is  the  basis  for  our 

algorithm. 

5.2.2  One  Phase  of  Perturbations 

Achieving  poly-log  recursion  depth  for  the  basic  algorithm  described  in  the 
previous  section  is  a  non-trivial  matter.  The  reason  is  that  it  is  hard  to  control 
which  cut  or  cuts  will  become  tight.  Furthermore,  since  we  have  limited  ourselves 
to  perturbations  that  do  not  change  the  ordering  of  the  a,’s,  it  is  not  clear  that  a 
tight  cut  can  always  be  obtained. 

Say  we  are  shifting  units  from  a*  to  a;  (for  some  k>l).  How  many  units  can 
we  shift?  Viewing  the  unit  shifting  as  a  sequential  process  (i.e.  shifting  one  unit  at 
each  time  step),  we  can  shift  until  one  of  three  things  happens. 

(1)  a;  becomes  equal  to  a^i. 

(2)  a*  becomes  equal  to  a*  +  1. 

(3)  slfg(x,y)  becomes  zero,  for  some  l  ^ x<k . 

In  case  (3)  progress  is  made,  since  a  tight  cut  is  created,  and  we  can  split  the  prob¬ 
lem  into  two  smaller  problems.  What  about  the  first  two  cases?  We  observe  that  we 
have  possibly  reduced  the  number  of  different  a,  values.  This  observation  is  the 
key  to  our  approach  for  performing  perturbations. 

Definition:  The  complexity  of  an  instance  ja,b)  comp(a,b )  is  the  product  of  the 
number  of  different  a,  values  and  the  number  of  different  bj  values. 

Our  parallel  algorithm  works  in  phases.  The  input  to  a  perturbation  phase  is 
an  instance  of  certain  complexity,  say  K,  and  the  output  is  one  or  more  instances, 
each  having  complexity  bounded  by  c-K,  for  some  constant  c<l.  Finally,  if  the 
complexity  of  the  input  is  less  than  a  certain  constant,  B,  we  construct  a  realiza¬ 
tion  for  it  (this  is  the  base  case).  We  proceed  to  describe  one  perturbation  phase.  In 
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this  discussion  we  will  derive  the  constants  c  and  B.  For  better  exposition  we  will 
first  describe  a  phase  as  a  sequential  process.  The  parallel  implementation  will  be 

explained  later. 

In  each  phase  either  row  sums  or  column  sums  are  perturbed.  The  sequence 
that  is  perturbed  (row  or  column  sums)  is  that  which  has  a  larger  number  of 
different  values.  We  will  discuss  a  phase  in  which  row  sums  are  perturbed. 
Phases  in  which  column  sums  are  perturbed  are  essentially  identical. 

A  phase  starts  by  selecting  a  consecutive  set  of  active  rows,  {h,  h  +  l,...,l }  .  The 
parameters  h  and  l  depend  on  the  input,  (a,b)  and  its  complexity,  K,  and  will  be 
derived  later.  Let  L=at +  1  and  H  =  a,_1.  The  perturbation  is  performed  as  fol¬ 
lows:  repeatedly  shift  units  from  the  lowest  active  row,  (initially  row  l ),  to  the 
highest  active  row,  (initially  row  h).  A  row  becomes  inactive,  and  stops  sending  or 
receiving  units,  when  its  row  sum  either  drops  to  L  or  reaches  H.  The  phase  ter¬ 
minates  when  one  of  two  things  happens: 

(1)  At  most  one  active  row  is  left. 

(2)  sl-^x,y)  becomes  zero,  for  some  h^x<l. 

In  case  (1)  no  tight  cuts  have  been  obtained,  but  the  row  sums  of  all  the  active 
rows  (except,  possibly,  one)  have  become  either  L  or  H.  Therefore  the  number  of 
different  row  values  decreases. 

In  case  (2)  one  or  more  tight  cuts  are  created,  and  the  instance  can  be  split,  using 
proposition  2.2,  into  two  smaller  instances  ("smaller”,  in  this  case,  means  less  rows 

and  lower  complexity). 

Let  a, 1 3  and  y  be  the  number  of  different  values  in  the  sets  {ax,  .  .  .  il. 
{ak,  .  .  .  ,a;}  and  {ai  +  1,  .  .  .  ,an}  respectively.  We  want  to  select  these  parameters  so 
as  to  minimize  the  complexity  of  the  outputs  of  the  phase. 

Case  (1)  :  The  number  of  different  row  sums  remaining  is  bounded  by  a  +  y  +  1 
(since  the  fi  values  corresponding  to  active  rows  disappeared,  except  for  at  most 

one). 

Case  (2)  :  Zero  slack  is  obtained  for  one  or  more  rows  in  the  range  [h,l- 1].  A  sim¬ 
ple  calculation  shows  that  the  number  of  different  row  sums  in  the  resulting 
instances  is  bounded  either  by  a  -+•  fi  + 1  or  by  {$  +  Y  +  !• 
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Thus  we  need  to  minimize  the  maximum  of  a  +  fi  +  l  ,  a  +  7  +  l  and  fi  +  y-rl 
subject  to  a  +  f3Jry=K  (where  K  =  comp{a,b )).  The  solution  is,  of  course,  to  have 
a,/8  and  7  as  equal  as  possible,  i.e.  all  roughly  K/3.  From  this  calculation  one  can 
see  that  the  complexity  can  be  reduced  by  these  perturbations  as  long  as  the 
number  of  different  row  values  is  more  than  5. 

To  summarize,  if  the  input  to  a  phase  has  complexity  K,  the  outputs  have 

complexity  bounded  by  f-^1  +  1.  Thus  the  total  number  of  phases  is  0(log(n-m)). 
The  base  case  is  any  instance  with  at  most  5  different  row  values  and  5  different 
column  values. 

We  next  discuss  the  parallel  implementation  of  one  perturbation  phase.  The 
first  step  is  to  calculate  the  new  row  sums  and  slack  matrix  under  the  assumption 
that  none  of  the  cuts  become  tight.  If  this  new  slack  matrix  is  strictly  positive 
then,  indeed,  we  are  in  case  (1). 

Let  p  be  the  initial  number  of  active  rows  (p  =  l-h  +  1).  After  the  phase 
(assuming  case  (D),  there  will  be  q  rows  of  value  H,  p-q  +  l  rows  of  value  L  and 
one  row  of  value  J,  where  H>/sL.  q  and  I  are  easy  to  calculate: 

i(  .a, -I) 

I  =  ^(a*  —  L)  mod  ( H—L ) 

i  =  h 

Let  m,  =  min  |  lSySm/,  and  let  m,'  be  the  new  minimum  slack  in 

row  i  after  the  phase  is  completed  (assuming  case  (D).  Then: 

For  h^i<h+q  mx'  —  mx  —  ^ —  ) 

j  =  h 

For  h  +  q  ^i<l  mx'  —  mx  —  ^  (a;— L) 

;=l  +  l 

If  all  the  mi  are  positive,  then  we  are  provably  in  case  (1).  If  not,  we  need  to 
detect  at  what  "time  step"  (during  the  "sequential  process")  the  first  tight  cut  was 
created.  This  turns  out  to  be  a  simple  task  for  the  following  reason:  if  we  plot  the 
value  of  any  entry  in  the  slack  matrix  as  a  function  of  time,  it  decreases  by  one 
unit  each  step  until  some  point  in  time,  and  remains  constant  from  that  point  on. 
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Thus  a  row  where  the  first  zero  slack  occurs  is  a  row  for  which  m  is  minimum 
among  the  rows  that  have  m'  < 0.  The  total  number  of  units  shifted  in  the  phase  is 
this  minimum  m  value.  It  is  easy  to  compute  the  new  row  sums  given  the  number 
of  units  shifted. 

In  both  cases  ((1)  and  (2))  we  need  to  calculate  the  number  of  units  shifted 
from  row  j  to  row  t,  for  every  h.Hi<j^l.  {These  numbers  will  be  used  later,  in 
the  correction  phase.)  This  calculation  can  be  performed  by  a  simple  partial-sums 
computation. 

5.2.3  Correcting  a  Perturbed  Solution 

After  a  realization  is  obtained  for  the  perturbed  instance  we  need  to  correct  it 
in  order  to  obtain  a  realization  for  the  original  instance.  Clearly  the  required  task 
is  to  shift  units  back  to  their  original  rows.  The  rows  which  participate  in  the 
shifting  of  units  are  divided  into  two  sets  -  the  donors  and  the  receivers,  where 
donors  shift  units  to  the  receivers  during  the  perturbation  phase,  and  get  them 
back  at  the  correction  phase.  Note  that  no  row  is  both  a  donor  and  a  receiver  in 
any  given  phase.  Let  be  the  the  number  of  units  shifted  from  the  donor  j  to 

the  receiver  t  in  the  perturbation  phase. 

Definition:  Let  M  be  a  realization  matrix.  Sliding  a  unit  from  row  i  to  row  j 
means  changing  M[i,k]  from  1  to  0  and  M[j,k]  from  0  to  1  ,  for  some  column,  k. 

Lemma  5.1:  Given  any  realization  of  the  perturbed  instance,  AT,  it  is  always  pos¬ 
sible  to  correct  it  by  sliding  s(j,i)  units  from  receiver,  i,  to  donor,  j ,  for  all 
receivers  and  donors. 

Proof:  Again  it  is  convenient  to  view  the  process  of  sliding  units  as  a  sequential 
one.  Assume  that  some  of  the  units  have  been  slid,  but  less  than  units  have 

been  slid  from  row  i  to  row  j.  Call  the  current  matrix  Mv  We  will  show  that  it  is 
possible  to  slide  a  unit  from  row  i  to  row  j  in  M\,  which  proves  the  lemma. 

Since  units  were  shifted  from  row  j  to  row  i  in  the  perturbation  phase,  it  is 
the  case  that  a;-  was  no  larger  than  a,  before  the  phase  began.  Other  perturbations 
in  which  rows  i  and  j  might  have  participated  only  increased  the  row  sum  of  i  and 
decreased  the  row  sum  of  j.  Now,  since  less  than  s(j,i)  units  have  been  slid  from 
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row  i  back  to  row  j,  it  follows  that  row  i  has  more  l’s  than  row  j  in  Mv  By  the 
pigeonhole  principle  there  is  some  column,  k,  such  that  M1[i,A]  =  1  and  Af10',*]  =  °- 

[] 

The  implication  of  the  proof  above  is  that  we  do  not  need  to  be  very  careful  in 
the  way  we  slide  units.  The  main  problem  we  need  to  solve  is  that  conflicts  may 
arise  when  we  slide  many  units  in  parallel.  This  could  happen  since  a  donor  might 
have  shifted  units  to  many  receivers,  and  a  receiver  might  have  received  from 
many  donors.  Our  goal  is  to  break  down  the  problem  into  a  set  of  independent 
problems,  which  can  all  be  solved  in  parallel.  The  first  step  is  to  get  a  formal 
description  of  the  donor-receiver  relation. 

Definition:  The  donation  sraoh  G=(D,R£)  is  a  bipartite  graph  with  a  vertex, 
d,£D,  representing  each  donor  and  a  vertex  rt(iR  representing  each  receiver,  such 

that  the  edge  {d,,r,}  is  in  E  if  and  only  if  s(j  ,i)>  0. 

The  following  lemma  plays  a  key  role  in  simplifying  the  situation: 


Lemma  5.2:  The  donation  graph,  G,  is  a  forest. 

Proof:  Call  a  neighbor  of  a  vertex,  v,  nontrivial  if  it  has  at  least  one  other  neigh¬ 
bor  besides  v.  It  follows  from  the  way  the  perturbations  were  performed  that  each 
vertex,  v,  has  at  most  two  nontrivial  neighbors,  one  that  became  inactive  before  v, 
and  one  that  became  inactive  after  v.  Furthermore,  all  the  vertices  can  be  ordered 
according  to  when  they  became  inactive.  Therefore  G  cannot  contain  any  cycles. 

[] 

One  can  see  that  a  matching  in  the  donation  graph,  G,  corresponds  to  an 
independent  set  of  sliding  problems.  However,  there  is  no  guarantee  that  the  edges 
of  G  can  be  partitioned  into  a  small  set  of  matchings,  since  G  might  have  vertices 
of  high  degree.  Thus  a  more  subtle  partition  is  required. 

Definition:  A  constellation  is  a  subgraph  of  a  given  graph  all  of  whose  connected 
components  are  stars  (where  a  star  is  a  tree  with  at  most  one  non-leaf  vertex). 

Lemma  5.3:  The  edges  of  a  forest  can  be  partitioned  into  two  (edge-disjoint)  con¬ 


stellations. 
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Proof:  It  suffices  to  show  that  the  edges  of  a  tree  can  be  partitioned  into  two  con¬ 
stellations.  Let  T  =  {V rE)  be  a  tree,  and  take  it  to  be  rooted  at  some  vertex,  P. 

The  level  of  a  vertex  is  its  distance  from  P.  v  is  the  parent  of  u  if  {u,v}ZE  and  v  is 

closer  to  P  than  v.  The  partition  of  T  into  two  constellations, 
C i  =  (V ,E i)  ,  C2  =  (V,E2)  ,  is  as  follows; 

Ex  =  {  { u,v }  |  u  is  the  parent  of  v,  the  level  of  u  is  even  } 

E2  =  {  {u  ,v }  j  u  is  the  parent  of  v,  the  level  of  u  is  odd  } 

An  example  of  such  a  partition  is  shown  in  fig.  5.3.  [] 


Fig  5.3  :  Partitioning  a  tree  into  two  constellations 


Our  solution  is  based  on  the  observation  that  a  constellation  corresponds  to  a  set  of 
independent  sliding  problems  which  we  can  solve  in  parallel.  Therefore  our 
approach  will  be  to  partition  the  donation  graph  into  two  constellations  and  then 
to  slide  units  in  two  stages  -  first  corresponding  to  one  constellation  and  then  to 
the  other. 

A  star  in  the  donation  graph  corresponds  to  several  donors  with  a  common 
receiver  or  several  receivers  with  a  common  donor.  These  two  cases  are  symmetric, 
so  we  will  discuss  only  the  first  one.  In  what  follows  we  describe  a  parallel  algo¬ 
rithm  that  slides  all  the  units  corresponding  to  a  star  with  receiver  R  and  donors 
D \>  ■  •  •  ,Dd.  Let  M  be  a  realization  matrix  of  the  perturbed  instance  we  are  about 
to  correct.  Let  r  ,  dx,  .  .  .  ,dd  denote  the  number  of  l’s  in  rows  R  ,Dv...,Dd 
respectively  and  let  s*  =  s(DnR).  We  need  to  slide  s,  units  from  R  to  £>,,  for  all 

* 
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1  <z  <d  in  parallel.  Our  approach  is  to  solve  a  matching  problem  in  the  following 
bipartite  graph,  B  ={X,Y  J£): 

X  =  {xj  |  M[RJ]  =  1} 

Y  =  { yt,k  I  l^i^d  ,  l^k  <s,  } 

E  =  {  {xjJl'jJ  i  M[DJ]  =  0} 

Lemma  5.4:  Every  matching  of  B  which  covers  all  the  vertices  in  Y  corresponds  to 
sliding  Sj  units  from  R  to  D,,  for  all  l<:<d  simultaneously. 

Proof:  By  construction,  there  are  2  si  vertices  in  V,  one  corresponding  to  each 

i  =  i 

unit  that  was  shifted  from  some  D,  to  R.  There  is  an  edge  between  Xj  and  yl  jt  if 
and  only  if  a  unit  can  be  slid  from  row  R  to  row  Dl  in  column  k.  The  claim  is, 
therefore,  evident.  [] 

At  first  sight  it  seems  that  we  need  to  solve  a  maximum  bipartite  matching 
problem,  but  closer  observation  reveals  the  following: 

Lemma  5.5:  Every  maximal  matching  in  B  is  maximum. 

Proof:  It  suffices  to  show  that  any  matching  which  does  not  cover  all  the  vertices 
in  Y  can  be  extended.  The  degree  of  y,  *  in  B  is,  by  definition,  at  least  r  —  d,. 
Before  the  perturbation  phase  the  row  sum  of  R  was  no  less  than  that  of  row  Dx. 

After  the  perturbations,  the  row  sum  of  R  increased  by  at  least  ^s;,  and  the  row 

t  =  l 

sum  of  decreased  by  at  least  1.  Therefore: 

For  alii,  k  degreeiy ,  *)  S  r  —  d ,  S  ]^s,  +  1  =  |y|  +  l 

i  =  i 

Since  any  matching  contains  no  more  than  |y|  edges  it  follows  that  no  partial 
matching  is  maximal.  0 

A  maximal  matching  can  be  constructed  efficiently  in  parallel  ([IS],[Luby ]). 
Our  parallel  algorithm  is,  therefore,  the  following:  construct  the  donation  graph, 
and  partition  it  into  two  edge-disjoint  constellations,  Ci  and  C 2-  For  each  com¬ 
ponent  of  Ci,  construct  the  bipartite  graph,  B,  as  described,  and  find  a  maximal 
matching,  F,  in  it.  For  all  edges  of  B  do  in  parallel:  if  (x},y,  jJ^.F  then  slide  a  unit 
from  R  to  Dt  in  column  j.  Finally,  repeat  this  procedure  on  C 2  (with  the  updated 


matrix). 


It  follows  from  lemmas  5.4  and  5.5  that  after  performing  these  operations  all 
the  perturbations  (of  the  current  phase)  are  corrected. 

5.2.4  The  Base  Case 

The  base  case  for  our  algorithm  is  when  the  number  of  different  values  of  row 
and  column  sums  is  bounded  by  a  constant  (5).  The  problem  is  then  characterized 
by  the  different  values:  Qi,  ■  '  •  ,a5  and  6lt  •  ■  ,b5  and  their  multiplicities 
ni>  >n5  and  mh  5  respectively.  Let  M  be  the  realization  matrix  we  con¬ 

struct,  and  let  be  the  submatrix  of  M  induced  on  the  rows  with  sum  a,  and 
columns  with  sum  bj.  We  construct  M  in  two  steps: 

Step  1:  For  each  iJ,l<ij<5,  determine  the  number,  F.  ,,  of  units  in  M 

Step  2.  For  each  i  J  ,1  2=  i  J  <  5,  distribute  the  F^j  units  between  the  different  rows 
and  columns  of  M,j. 

We  carry  out  step  1  by  constructing  a  flow  network  of  constant  size,  and 
finding  a  max  flow  in  it.  The  network  has  twelve  vertices:  a  source  s,  a  sink  t,  five 
"row"  vertices  ■  ,u5,  and  five  "column"  vertices  vh  •  •  •  ,u5.  The  arcs  are  of 

three  kinds:  arcs  from  s  to  each  u,  with  capacities  n,-at,  from  each  vj  to  t  with 
capacities  rtij-bj,  and  from  each  ut  to  each  uj  with  capacities  n^rrij.  This  network  is 
simply  the  result  of  taking  the  original  network  flow  formulation  for  this  problem, 
and  compressing  all  "row"  vertices  with  equal  capacity  into  one  vertex,  and  simi¬ 
larly  for  "column"  vertices.  Since  this  network  is  of  constant  size,  a  max  flow  can 
be  constructed  in  constant  time  using  standard  sequential  methods. 

In  step  2  we  convert  the  solution  for  the  compressed  network  to  a  solution  for 
the  original  network  by  distributing  the  flow  along  each  compressed  arc  evenly 
between  the  arcs  it  defines.  We  do  this  by  providing  a  solution  for  the  following 
problem:  construct  MtJ  so  that  xtj  selected  rows  have  each  r,j  units,  ytJ  columns 
have  each  units  and  each  of  the  remaining  rows  and  columns  have  r  —  1  and 
cij  ~  1  units  respectively.  First,  it  is  not  hard  to  see  that: 
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ctJ=Hpl  yij  =  FlJ  nod  rrij 

J 

Assume  we  want  each  of  the  first  xtJ  rows  and  first  ytJ  columns  to  have  rLj  and  ctJ 
units  respectively.  Our  solution  is  to  put  the  units  of  the  first  row  in  the  first  rtJ 
columns,  the  units  of  the  second  row  in  the  cyclically  next  set  of  columns  etc.  An 
example  is  shown  in  fig.  5.4. 
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Fig.  5.4:  Structure  of  Af,j  with  5  rows,  7  columns  and  13  units. 
Selected  rows  and  columns  are  marked  with  arrows. 


A  construction  for  arbitrary  sets  of  selected  rows  and  columns  (not  necessarily  the 
first  ones)  is  obtained  from  the  one  described  above  by  simply  permuting  the  rows 
and  columns  appropriately. 

Now  we  are  ready  to  construct  a  realization,  M ,  for  the  base  case.  The  values 
FtJ  determine  the  x(J  and  ytj  values.  All  we  need  to  ensure  is  that  any  two  rows 
(columns)  with  equal  row  (column)  sums  get  selected  the  same  number  of  times. 
This  can  be  done  by  selecting  the  first  x, :  rows  in  Ml  lt  the  cyclically  next  set  of 
x,  2  rows  in  Ml  2  and  so  on,  and  similarly  for  columns. 

Since  =  n,-al  ,  the  total  number  of  rows  selected  in 

j  =  i 

{Afu,  .  .  .  ,M1i5}  is  an  integer  multiple  of  n,,  and  it  follows  that  any  two  rows 
with  equal  row  sums  are  selected  the  same  number  of  times.  A  similar  argument 
holds  for  columns.  Thus  the  construction  described  yields  a  correct  solution  for  the 
base  case. 
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5.2.5  The  Algoriihm 

In  this  section  we  state  the  algorithm  more  formally.  A  few  words  about  nota¬ 
tion:  I.P  is  shorthand  for  "in  parallel",  comments  are  between  double  parentheses; 
i.k  denotes  a  range  of  indices  (in  a  matrix  or  a  sequence);  ||  denotes  concatenation 
of  sequences;  #A  is  the  cardinality  of  the  set  A. 

procedure  MATRIX-CONSTRUCTION (a,?) 

((  This  is  the  recursive  procedure  for  constructing  a  matrix,  M,  with  given  row 
sums,  a,  and  column  sums,  b.  The  row  and  column  sums  are  assumed  to  be  given 
in  a  non-decreasing  order.  )) 

(1)  Let  n  =  length  of  a*;  m  =  length  of  b. 

(2)  Compute  V-  and  Vf  the  number  of  different  values  in  a  and  b  resp. 

(3)  If  VL.<5  and  V^5  then  return  BASE-CASE(a,b). 

(4)  (bf,j? ,S  ,SL  ,pert,zerop)  =  PERTURBATION(a,b). 

(5)  If  not  zerop  then  M'  =  MATRIX— CONSTRUCTION {$$). 

(6)  Else  let  x,y  be  such  that  SL[x,y}  =  0  and  either  a,  is  in  the  middle  third  of 
the  a*  values  or  by  is  in  the  middle  third  of  the  b  values.  Do  the  following  I.P. 

(6.1)  I.P  set  M'[iJ]  =  l  for  all  l<t=£x  ,  1  &j*y- 

(6.2)  I.P  set  M'[iJ]  =  0  for  all  x<i<n  ,y<j<m. 

(6.3) 

M'[x  +  l:n  ,  1  :y]  =  MATRIX-CONSTRUCTION (  S[x  +  l:n]  ,  ?[1  :y\-x  ) 

(6.4) 

M'[l:x  ,  y  +  l:m]  =  MATRIX-CON STRUCTIONi  ct[l :x]-y  ,  J[y  +  l:m]  ) 

(7)  M  =  CORRECTION {M'  ,S, pert). 

(8)  Return  M. 

end  MATRIX-CONSTRUCTION 
procedure  PERTURB ATION{a,K) 

((  This  procedure  computes  one  perturbation  phase.  The  inputs  are  row  sums,  a, 
and  column  sums,  b.  The  outputs  are  new  row  and  column  sums,  a  and  ^  resp,  the 
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slack  matrix  SL ,  the  matrix  of  numbers  of  units  shifted  S,  a  variable  pert  indicat¬ 
ing  whether  row  sums  or  column  sums  have  been  perturbed  and  a  variable  zerop 
indicating  if  zero  slack  is  obtained.  )) 

(1)  Let  n  =  length  of  a;  m  -  length  of  b. 

(2)  Compute  Vs  and  Vp-  the  number  of  different  values  in  a  and  b  resp. 

If  VL>Vr  then  set  pert  =  "rows".  Else  set  pert  =  "columns"  and  perform  the 
rest  of  this  routine  with  b,  Vp  and  m  instead  of  a,  V-  and  n  resp. 

(3)  Find  h  and  l  for  which  a>,*a a^a^i  ,  and  the  number  of  different 

Vs  V, 

values  in  <alt  .  .  .  ,ah.i>  and  <ah,  .  .  .  ,at>  are  [— j  and  f— 1  resp.  Let 
and  L—&1+. j. 

(4)  Compute  q  —  l-5 — pt — t — !  and  /  —  mod  { H—L ). 

ti—L  i=h, 

(5)  Compute  SL[iJ]  ((  the  slack  matrix  ))  for  all  1  ^  i  ,  1  S;  ^  m  I.P. 

(6)  Compute  ml  —  min  { SL[iJ ]  |  l^j^m  }  for  all  h^i^l  I.P. 

(7)  Compute  m,'  =  ml  —  —  a,)  for  all  h  ^  i  <  h  +  q  I.P. 

j  =  k 

(8)  Compute  m,'  =  m,  —  ^  (a;— L)  for  all  h  +  q  <1  I.P. 

J=i  +  l 

(9)  If  m{  >0  for  all  h  Si  <1  then  set  T  —  ^  (at  ~ L)  +  max  /0,a^  ~ U- 

i=k-tq- 1-1 

Else  set  T  =  min  {ml  |  m,'< 0},  and  set  zerop  to  true. 

(10)  Initialize  S[ij]  =  0  for  all  l<ij'</i. 

(11)  (cT.S)  =  SHIFT-UNITS«ah,...,al>,T#J. <)• 

(12)  Set  3  =  <a1,...,0/,_1>  ||  5'  ||  <a,  +  1,  .  •  •  ,an>- 

(13)  Set  SL[ij]  =  SL[iJ ]-  £  maxfO,a* -aj  for  all  =£/  ,  1^'^  I  P- 

*  =/t 

(16)  Return  (a, 6 ,S ,SL ,pert). 
end  PERTURBATION 


procedure  SHIFT-UNITS(a,T  JI ,L) 


78 


((  Shifts  a  total  of  T  units  between  active  rows  with  row  sums  a.  H  is  the  upper 
bound  on  new  rows  sums  and  L  is  the  lower  bound.  Returns  the  new  row  sums  and 
the  matrix.  S,  of  the  numbers  of  units  shifted  between  pairs  of  rows.  )) 

(1)  Denote  the  elements  of  a*  by  ahl  .  .  .  ,ai 

(2)  Compute  for  all  l^i^T  I.P: 

dt  =  max  {j  |  ]£( ak-L )  }  ((  donor  of  unit  i  )) 

*  =J 

r,  =  min  { j  \  i  2  2  [H  -ak)  }  ((  receiver  of  unit  i  )) 

k  =  n 

(3)  Compute  S[iJ]  =  #{  k  |  dk  =  t  ,  rk  =j  }  for  all  h  <  i  S  /  I.P. 

(4)  Compute  al=at  +  rl  —  dl  for  all  h  ^i  I.P. 

(5)  Return  (a,S). 
end  SHIFT-UNITS 

procedure  CORRECTION {M ,S ,pert) 

((  This  procedure  computes  one  correction  phase.  .The  inputs  are  a  realization 
matrix,  M,  a  matrix,  S,  containing  amounts  of  units  to  be  slid  and  a  variable,  pert, 
indicating  if  units  need  to  be  slid  between  rows  or  columns.  The  output  is  the 
matrix,  M,  after  it  has  been  corrected.  )) 

(1)  Let  n  =  length  of  S. 

(2)  Construct  the  donation  gTaph,  G,  where: 

V(G)  =  A.  •  •  •  ,n}  E(G )  =  {{ij}  |  S[iJ]> 0/ 

(3)  For  every  connected  component,  T,  of  G  do  I.P: 

(3.1)  Partition  T  into  two  constellations,  C:  and  C2- 

(3.2)  Perform  SLIDE -UNITS (C ,M ,S ,pert)  for  every  connected  com¬ 
ponent,  C,  of  Ci  I.P. 

(3.3)  Perform  SLIDE -UNITS{C ,M ,S K pert)  for  every  connected  com¬ 
ponent,  C,  of  C2  I.P. 

(3.4)  Return  M 
end  CORRECTION 
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procedure  SLIDE— UNITS  (C  ,M ,S  ,pert ) 

((  Units  are  slid  in  the  matrix  M,  between  one  donor  and  many  receivers  or  one 
receiver  and  many  donors.  The  vertices  of  the  star,  C,  are  the  participating 
rows/columns  of  M.  The  matrix,  S,  contains  the  numbers  of  units  to  be  slid  and 
the  variable  pert  indicates  if  units  need  to  be  slid  between  rows  or  columns.  )) 

(1)  Let  c  be  the  unique  non-leaf  of  C  ((  If  C  has  exactly  two  vertices  let  c  be 
any  one  of  them  )).  Let  lh  .  .  .  ,ld  be  the  remaining  vertices  of  C. 

(2)  If  pert  =  "rows"  then  let  Mc,Mi,,  ■  ■  ■  Mid  be  rows  c,L,  ...  ,ld  of  M. 

Else  let  Mc,Mh,  ■  ■  ■  Mid  be  columns  c,/1;  ...  ,ld  of  M . 

(3)  If  S[c,/1]>0  ((  i.e.  c  is  a  donor  and  I,  are  receivers  ))  then  complement 
Mc ,M i , ,  .  .  .  ,MU  LP,  and  set  comp  to  true. 

Let  s,  =  ma x{S[luc],S[c ,1,]}  ((  the  number  of  units  to  be  slid  from  Mc  to  Mt.  )) 
for  1  S  i  <  d. 

(4)  Construct  the  bipartite  graph,  B  =(X,Y,E ): 

X  =  { Xj  |  Me[j)  =  l} 

Y  =  {yith  |  1  si<d  ,  1  £k*Si} 

E  =  { I  Ml\j]  =  0} 

(5)  Compute  F,  a  maximal  matching  in  B. 

(6)  For  all  {xj,yl  kJtF  do  in  parallel:  set  Mc[j]  =  0  and  M;itj']  =  l. 

(7)  If  comp  then  complement  Mc,Mix>  ■  ■  • 

(8)  Copy  back  into  their  original  location  in  M  ((  see  step  (2) 

)). 

end  SLIDE-UNITS 
procedure  BASE-CASE{a,&) 

((  Constructs  a  matrix,  M,  with  row  sums  a  and  column  sums  b,  where  the  number 
of  different  values  of  elements  in  a  and  b  is  at  most  five.  )) 

(1)  Let  Oi>  •  •  •  >ak  and  bx>  •  ■  •  >6/  be  the  values  of  the  elements  of  a 
and  T  resp.,  and  let  .  .  .  ,nk  and  mu  .  .  .  ,mt  be  their  respective  multi¬ 
plicities. 
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(2)  Construct  a  flow  network,  N,  with  vertices  s,  t,  ux,  .  .  .  ,  u*»  ui>  •  •  *  vi 

and  the  following  arcs  (for  all  1  <  i  ^  k  ,  1  ^  l  )'• 

from  s  to  u,  with  capacity  n,'a; 
from  uj  to  t  with  capacity  mybj 
from  Uj  to  Vj  with  capacity  n^mj 

(3)  Find  a  max  s  —  t  flow  in  N.  For  all  ij  let  FtJ  be  the  flow  on  the  arc  iulfVj). 

(4)  For  all  ij  construct  MtJ  as  shown  in  figure  2.4.  There  are  FtJ  mod  n{ 

selected  rows,  starting  at  row  (  + 1)  mod  n,  (cyclically;  and 

A  =  1 

F  mod  m  selected  columns,  starting  at  column  (  ^Flth  +  l)  mod  nt. 

(5)  Let  M  be  the  appropriate  concatenation  of  the  AftJ  s. 

(6)  Return  M . 
end  BASE— CASE 

5.2.6  Parallel  Complexity 

The  time  and  processor  bounds  of  our  algorithm  depend  on  how  we  choose  t.o 
implement  the  maximal  matching  routine.  Two  competing  implementations  are 
given  in  [IS]  and  [Luby],  On  a  graph  with  e  edges,  Israeli  and  Shiloach’s  algo¬ 
rithm  takes  time  0(1  og3*)  and  uses  0(e)  processors  on  a  CRCW  PRAM.  Luby  s 
algorithm  requires  only  0(log2e)  time  on  an  EREW  PRAM,  but  uses  0(e2)  proces¬ 
sors.  It  is  straightforward,  though  somewhat  tedious,  to  verify  that  all  the  other 
operations  in  one  phase  of  MATRIX-CONSTRUCTION  can  be  performed  with  the 
resources  required  for  maximal  matching  (in  both  the  implementations  listed 

above). 

There  are  0(log|M|)  phases,  as  proven  in  section  5.2.2  (Where  \M\  =  nm).  In 
a  correction  phase  for  rows  there  are  0(n)  parallel  calls  to  maximal  matching  on 
bipartite  graphs  with  0<m2)  edges  each.  When  columns  are  corrected,  there  are 
0(m)  calls,  each  of  sire  0U2>.  Thus  the  number  of  processors  required  is 
0(nm(n  +  m) )  =  0(|M|  (n  +  m))  using  [IS],  and  0(nm(n  +  m)3)  =  0(|M|  U +m)3) 
using  [Luby].  When  n  =  9(m)  the  processor  requirements  are  OClAfl1-  >  and 

0(|Af|25)  respectively. 
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5.3  The  Symmetric  Supply-Demand  Problem 

In  this  section  we  will  show  how  the  methodology  developed  in  section  5.2 
gives  rise  to  a  parallel  algorithm  to  the  symmetric  problem.  Here  the  input  is  a 
sequence  of  integers,  A-A-  '  '  '  -  A.  summing  to  zero.  The  goal  is  to  construct 
a  flow  pattern  in  which  every  vertex  can  send  up  to  one  unit  of  flow  to  any  other 
vertex  such  that  the  flow  out  of  u,  minus  the  flow  into  it  is  ft  (for  all  The 

goal  can  be  viewed  as  constructing  an  nXn  zero-one  matrix,  M  (where  M[iJ]  is 
the  amount  of  flow  sent  from  vertex  i  to  vertex  j)  such  that,  for  all  t,  the  the 
number  of  ones  in  row  i  minus  the  number  of  ones  in  column  i  is  fr  Note  that 
changing  the  values  along  the  main  diagonal  does  not  change  the  instance  M 
describes,  so  they  can  all  be  set  to  zero  at  the  end  of  the  computation. 

Again  we  start  with  a  network-flow  formulation  for  the  problem.  The  flow 
network  has  n  +  2  vertices:  s,  t,  .  .  .  ,  vn.  If  ^>0  then  there  is  an  arc  from  s  to 
Vi  with  capacity  fn  and  if  f,<0  then  there  is  an  arc  from  ol  to  t  with  capacity  /j. 
Also,  there  is  an  arc  with  capacity  1  from  u,  to  vj  for  all  1  Examination  of 

this  network  shows  that  there  are  only  n  potential  min  cuts:  of  all  cuts  containing 
x  vertices  with  s,  the  one  containing  ux,  .  •  •  ,vx  is  of  smallest  capacity.  Thus,  for 
this  problem  we  have  a  slack  vector.  An  analysis  similar  to  the  one  in  section  5.2.1 
shows  that,  for  all  1Sx</i: 

sl-Ax)  =  x-(n  -x)  -  ^tfi 

i  =  i 

It  is  interesting  to  note  that  here,  as  opposed  to  the  matrix  construction  problem, 
the  object  describing  the  slacks  (a  vector  of  length  n)  has  a  different  size  (and 
dimension)  than  the  object  being  constructed  (an  nXn  matrix). 

A  perturbation  phase  is  performed  in  the  same  way  as  in  section  5.2.2,  except 
that  there  is  only  one  sequence  being  perturbed  (as  opposed  to  separate  row  and 
column  sequences).  Again  we  have  the  property  (similar  to  proposition  5.3)  that 
shifting  a  unit  from  fj  to  f,  ( i  <j)  decreases  the  slacks  at  entries  i,  i  +  1,  •  •  •  >j  ^ 
by  1  and  does  not  change  the  other  entries. 

A  correction  phase  is,  however,  trickier  than  before.  The  reason  is  that  if  a 
unit  is  to  be  returned  from  entry  i  to  entry  j ,  it  can  be  done  either  by  sliding  a 
unit  from  row  i  to  row  j  or  by  sliding  a  unit  from  column  j  to  column  i.  The 
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equivalent  of  lemma  5.1  holds  here,  but  for  each  unit  only  one  of  the  two  ways  of 
sliding  listed  above  is  guaranteed  to  exist.  Furthermore,  if  we  simultaneously  try 
to  slide  units  in  rows  and  in  columns,  conflicts  may  arise  (where  a  conflict  is  an 
attempt  to  slide  two  units  into  the  same  entry). 

Our  solution  is  to  perform  the  correction  in  two  stages:  first  slide  between 
rows,  then  slide  between  columns.  The  first  stage  is  identical  to  a  row-correction 
phase  of  section  5.2.  The  only  difference  is  that  the  maximal  matching  computed 
does  not  necessarily  cover  all  the  vertices  of  one  side  of  the  bipartite  graph,  B. 
After  the  first  stage,  we  update  the  donation  matrix  (the  s(iJYs),  according  to  the 
numbers  of  units  slid  in  the  first  stage.  We  then  perform  a  column-correction 
phase  for  the  resulting  problem. 

Lemma  5.6:  Every  maximal  matching  computed  in  the  second  stage  is  maximum. 
Proof:  As  in  section  5.2.3,  let  R,  Dx,  .  .  .  ,  Dd  be  the  vertices  of  a  star  in  the 
donation  graph.  Let  B  l=(Xl,Yi,El)  be  the  bipartite  graph  for  sliding  between  the 
rows  corresponding  to  these  vertices  in  the  first  stage.  Let  B  2  =  {X  2,Y  2JE  2)  be  the 
bipartite  graph  for  sliding  between  the  columns  corresponding  to  these  vertices  in 
the  second  stage.  Then,  as  in  the  proof  of  lemma  5.5,  for  each  vertex  in  Y 2,  the 
sum  of  its  degrees  in  Bx  and  B2  is  at  least  + 1.  It  follows  that  the  degree  of 
every  such  vertex  in  B2  is  at  least  lY^l^l.  D 

Corollary’  5.1:  Every  unit  that  is  perturbed  gets  slid  in  one  of  the  two  stages. 

The  base  case  is  solved  along  the  same  lines  described  in  section  5.2.4,  but  a 
few  more  details  need  to  be  handled.  The  base  case  is  when  there  are  at  most  five 
different  values,  fi  >  •  •  •  >/r5,  with  respective  multiplicities  nx,  .  .  .  ,n5.  Again 
we  start  by  finding  a  max  flow  in  a  constant  size  network  (having  7  vertices  - 
s ,  t,  Vi,  .  .  .  ,  u5)  to  determine  the  number  of  units,  F,  j,  in  M;j.  Now,  as  opposed  to 

the  previous  case,  F,  ,  needn’t  be  an  integer  multiple  of  n Therefore,  after 
j= i 

distributing  units  evenly  between  all  rows  with  the  same  f  value  (as  described  in 
section  5.2.4),  some  of  these  rows  will  have  p  units  and  some  will  have  p  —  1  units 
(for  some  appropriate  p).  Similarly,  not  ail  the  columns  with  the  same  f  value  will 
necessarily  have  the  same  number  of  units.  We  overcome  this  obstacle  by 
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observing  that  if  i  and  j  have  the  same  f  value,  and  if  row  sum  i  is  greater  by  one 
than  row  sum  j  then  column  sum  i  should  be  greater  by  one  than  column  sum  j. 
Therefore,  the  problem  is  solved  by  (using  terminology  of  section  5.2.4)  selecting 
rows  and  columns  in  the  same  order. 

Finally  we  note  that  the  algorithm  for  the  symmetric  problem  uses  the  same 
resources  (time  and  number  of  processors)  as  the  matrix  construction  algorithm 
(see  section  5.2.6). 


5.4  Digraph  Construction 

In  this  section  we  describe  our  solution  for  the  problem  of  constructing  a  sim¬ 
ple  digraph  with  specified  in-degree  and  out-degree  sequences.  By  "simple"  we 
mean  no  self  loops  and  no  parallel  arcs.  Notice  that  if  self  loops  are  allowed,  this 
problem  is  exactly  the  matrix  construction  problem  described  in  section  5.2.  The 
digraph  construction  problem  can  be  stated  as  follows:  given  two  equal-length 
sequences,  (o:,  .  .  .  ,o„)  and  (ix,  .  .  .  ,t„)  ,  (that  are  not  necessarily  sorted!),  con¬ 
struct  an  nXn  zero-one  matrix,  M,  that  has  ok  l’s  in  row  k  and  ik  1’s  in  column  k 
(for  all  l<&:£rc),  so  that  all  the  elements  on  the  main  diagonal  of  M  are  zero. 

Our  solution  is  based  on  the  algorithm  described  in  section  5.2.  We  start, 
again,  by  looking  at  the  network  flow  formulation  for  this  problem.  The  network  is 
almost  identical  to  the  one  in  fig.  5.1,  except  that  each  vertex  on  the  left  is  missing 
one  outgoing  arc,  and  each  vertex  on  the  right  is  missing  one  incoming  arc.  It  is 
convenient  to  view  the  missing  arcs  as  existing  arcs  with  capacity  zero.  We  will 
call  these  blocked  arcs  and  the  corresponding  entries  in  the  realization  matrix 
blocked  entries.  Our  first  goal  is  to  show  that  in  this  case  too  there  are  only  n2 
potential  minimum  cuts.  Let  axs  •  •  •  >a„  and  •  •  •  >6„  be  the  sorted 

sequences  of  out-degrees  and  in-degrees  respectively  (i.e.  a  is  obtained  by  sorting 
o*  and  b  by  sorting  I),  and  let  N  be  the  network  corresponding  to  a  and  b  (similar 
to  the  one  shown  in  fig.  5.1).  The  capacity  of  the  cut  Cx  y  (as  shown  in  fig.  5.2)  is, 
in  this  case: 

capacity{CZj )  =  ^  a,-  +  ^  6,  +  x-y-B(x,y) 

i= i-t-1  j=y  + 1 

where  B{x,y )  is  the  number  of  blocked  arcs  crossing  the  cut.  Since  there  is  at  most 
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one  blocked  entry  in  every  row  and  every  column,  a  simple  argument  shows  that  if 

az  >ax  +  i  and  bv>by  +  l  then  this  cut  has  the  smallest  capacity  among  all  cuts  for 

which  the  s  side  contains  x  vertices  on  the  left  and  n  -y  vertices  on  the  right. 
However,  if,  say,  az  =  aI  +  l  then  a  the  cut  obtained  by  switching  vertices  uz  and 
ui^1  might  have  smaller  capacity,  since  the  number  of  blocked  arcs  crossing  it 
could  be  greater  by  one.  Therefore,  if  we  want  the  cuts  CZJ  to  be  the  only  poten¬ 
tial  minimum  cuts,  we  need  to  be  careful  about  the  ordering  of  'row  vertices 
corresponding  to  rows  with  equal  row  sums,  and  similarly  for  columns.  The  condi¬ 
tions  we  need  to  enforce  on  the  order  are,  simply:  if  aI=aI*1,  then  the  blocked 
entry  in  row  i  should  be  in  a  lower-indexed  column  than  the  blocked  entry  in  row 
x  +  1.  The  symmetrical  conditions  should  hold  for  columns. 

These  conditions  can  be  obtained  by  two  rounds  of  sorting:  first  sort  rows 
according  to  row  sums.  Sort  rows  with  equal  sums  according  to  the  corresponding 
column  sums  (i.e.  the  correspondence  given  by  the  o'  and  i  sequences),  breaking 
ties  arbitrarily.  Now,  sort  the  columns  according  to  column  sums.  Columns  with 
equal  sums  are  sorted  according  to  the  order  of  the  corresponding  rows  that  was 
obtained  in  the  first  round.  No  ties  can  arise,  since  there  is,  at  this  point,  a  total 

ordering  of  the  rows. 

After  this  preprocessing  is  done,  we  are  ready  to  proceed  along  the  same  lines 
as  the  algorithm  described  in  section  5.2,  with  a  few  modifications.  The  slack  func¬ 
tion  is  now: 

^  +  z"y  ~  5(io') 

i  =n-i  j  =  i 

By  the  discussion  above,  it  is  again  true  that  an  instance  is  realizable  if  and  only  if 
its  slack  matrix  is  non-negative.  If  sl~$(x,y)  =  0  then  M[iJ]  =  l  for  all 
l<i<x,l<j<y  except  for  blocked  entries,  and  M[iJ]  =  0  for  all  x  +  l^i 

,  y  +  l<j  <n. 

The  perturbation  phases  work  identically  here,  since  they  only  deal  with  the 
row  and  column  sums,  and  not  with  the  internal  structure  of  the  realization 

matrix. 

In  the  correction  phases  there  is  a  small  modification  -  units  should  not  be  slid 
into  blocked  entries.  This  is  fixed  by  modifying  the  bipartite  graph,  B,  in  the  obvi- 
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ous  way.  Also,  we  need  to  re-examine  the  proof  of  lemma  5.5.  It  works  out  exactly 
right  in  this  case,  since  it  turns  out  that: 

for  all  i,k  degree(yl<k )  S  |Y| 
which  is  precisely  sufficient  (see  the  original  proof). 

The  only  tricky  modification  turns  out  to  be  for  the  base  case.  Again,  there 
are  at  most  five  different  row  sum  values  and  five  different  column  sum  values. 
The  difficulty  is  that  there  are  blocked  entries  scattered  throughout.  This  spoils  the 
simple  cyclic  realization  that  existed.  We  overcome  this  by  partitioning  the  matrix 
into  finer  sub-matrices  than  in  the  previous  case.  Each  of  the  MtJ's  is  partitioned 
further  so  that  each  sub-matrix  either  contains  no  blocked  entries,  or  contains  a 
blocked  entry  in  every  row  and  column. 

Again  we  construct  a  realization  in  twro  steps.  The  first  step  is  to  determine 
the  total  number  of  units  in  each  sub-matrix.  This  is  done,  here  too,  by  solving  a 
max  flow  problem  (where  the  capacity  of  a  sub-matrix  is  the  number  of  non- 
blocked  entries  in  it).  Again,  the  network  here  is  of  constant  size,  so  a  max  flow 
can  be  computed  in  constant  time.  In  the  second  step,  the  units  are  distributed 
within  the  sub-matrices.  The  key  here  is  to  deal  first  with  the  sub-matrices  con¬ 
taining  blocked  entries.  It  is  not  always  possible  to  select  arbitrary  sets  of  rows 
and  columns,  but  it  is  possible  to  distribute  the  units  so  that  the  discrepancy 
between  any  two  rows  or  any  two  columns  will  be  at  most  one  unit.  This  can  be 
done  as  follows:  say  the  blocked  entries  are  along  the  main  diagonal  (this  will 
actually  always  be  the  case  because  of  the  preprocessing),  and  let  k  be  the  number 
of  rows  (and  columns)  of  the  sub-matrix.  Let  dr  (the  rth  diagonal)  be  the  set  of 
entries,  (ij),  for  which  j  —  i  s  r  {mod  k).  If  F  units  are  to  be  distributed,  fill 

c J,  dr,  and  place  the  remaining  units  in  d  p  .  An  example  is  shown  in 
1  1  A-l  [-H  +  1 


fig.  5.5. 
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Fig.  5.5:  A  5X5  sub-matrix  with  blocked  entries  containing  11  units. 


Now,  after  the  "problematic"  sub-matrices  have  been  dealt  with,  we  can  construct 
the  sub-matrices  with  no  blocked  entries  in  the  same  fashion  as  described  in  sec¬ 
tion  5.2.4.  The  same  arguments  for  proving  validity  of  the  scheme  go  through, 
because  there  is  at  most  one  blocked  entry  in  every  row  or  column. 


5.5  Bounds  on  Supplies  and  Demands 

Our  parallel  algorithm  for  the  matrix  construction  problem  can  be  extended  to 
the  case  in  which  the  sequences  a  and  b  represent  upper  bounds  on  row  sums  and 
lower  bounds  on  column  sums  respectively.  This  is  a  natural  extension  of  the 
matrix  construction  problem  when  rows  represent  supplies  and  columns  represent 
demands. 

Let  U  =  j £<Zi  and  L  =  J&j.  Let  M  be  a  realization  matrix  for  the 
i= 1  i=l 

instance  ( a,b),  and  let  S  be  the  number  of  1  s  in  M.  Then,  clearly,  L  -S  —U .  Say 
we  fix  S.  Then  the  problem  boils  down  to  the  following:  modify  the  sequences  a 
and  T  to  obtain  of  and  j?  respectively  so  that: 

(1)  at^ax  and  for  all  1  <:  ^ n  ,  1  <j 

(2)  =  2ft-  =  S. 

i=l  J= 1 

(3)  (a,  ?)  is  realizable. 

It  is,  of  course,  not  always  possible  to  satisfy  all  three  conditions  simultaneously. 
Thus  our  goal  is  find  such  a  pair  of  sequences  if  it  exists. 

The  key  for  obtaining  the  sequences  cf  and  ^  is  to  consider  the  slack  matrix, 
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as  defined  in  section  2.1.  Recall  that  the  condition  for  realizability  is  that  all  the 
slacks  are  non-negative,  and  that: 

=  S  a»  "  +  x'-v 

i= i+i  j= i 

where  a:S  ■  •  •  >a„  and 


Lemma  5.7:  Let  ax  >  ■  •  ■  2an  and  61>--s6m.  Let  a{k)  0{l))  be  the 
sequence  obtained  from  a  ( b )  by  subtracting  1  from  a*  (adding  1  to  6;).  Then 
tail  ),fiim))  is  realizable  if  t .aik),^l))  is  (for  any  l<k<n  ,  1 


Proof: 


^  (ot(l)j  —  aU),)  +  ]>]($( Dj  —  fiim'jj) 
i  =  i  + 1  j  =  1 


It  is  easy  to  see  that  for  all  values  of  x,y,k  and  l  this  difference  is  non-negative, 
which  proves  the  lemma.  Q 


Theorem  5.1:  Let  3,S)  be  obtained  from  a  by  repeatedly  subtracting  1  from  the 
largest  element  U  —  S  times  and  let  ]$[S)  obtained  from  b  by  repeatedly  adding  1 
to  the  smallest  element  S  —  L  times.  Then  (^sj.&Si)  *s  realizable  if  there  is  any 
realizable  pair  of  sequences  {% ~S)  where  yt— at  .  &j  —  bj  ^or  1  ^ t  —  n  ,  1  — y  — 

and  5t<  =  =  S' 

*= l  J= i 

Proof:  By  induction  on  U  —  S  using  lemma  5.7  D 

(a,s),^5))  can  be  obtained  from  (cf,6)  efficiently  in  parallel  by  a  simple 
partial-sums  computation.  The  algorithm  is: 

(1)  For  all  S  ,  f<S<C7  ,  do  I.P: 

(1.1)  Compute  of (5)  and  ${S). 

(1.2)  Test  if  (cf(S).^(S))  is  realizable  ((  using  the  method  described  in  [FF]  )). 

(2)  Select  an  S  for  which  0 7, $„&$))  is  realizable. 

(3)  Compute  M  =  MATRIX-CONSTRUCTION(a{S),]lls)). 


Steps  (1.1)  and  (1.2)  are  simple  partial-sum  computations,  and  can  be  imple¬ 
mented  using  0{n-rm)  processors.  Since  steps  (1)  and  (2)  can  be  implemented 
within  the  time  and  processor  bounds  used  for  step  (3),  the  algorithm  has  the  same 
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parallel  complexity  as  the  matrix  construction  algorithm.  Note  that  we  may  per¬ 
form  step  (2)  with  some  criterion  in  mind  (e.g.  "construct  a  matrix  with  the  smal¬ 
lest  possible  number  of  l’s  subject  to 

The  extension  of  the  symmetric  supply-demand  problem  turns  out  to  be  even 
simpler.  Here  the  natural  extension  would  be  that  all  the  f  values  represent  upper 
bounds,  since  making  a  number  "less  positive"  corresponds  to  less  supply,  and 
making  a  number  "more  negative"  corresponds  to  more  demand.  So  in  an  instance 
of  this  problem,  the  positive  numbers  would  sum  up  to  +H  and  the  negative 
numbers  would  sum  up  to  — L,  for  some  H>L. 

Here,  as  opposed  to  the  matrix  construction  problem,  it  is  clear  which  value  of 
S  works  best  (where  S  is  the  sum  of  the  positive  entries,  and  minus  the  sum  of  the 
negative  entries).  By  looking  at  the  expressions  for  the  slack  vector,  one  can  see 
that  decreasing  S  cannot  ruin  feasibility.  Therefore  S  should  be  selected  to  be  as 
small  as  possible,  i.e.  S=L. 

To  summerize,  only  the  positive  f  entries  should  be  modified.  Again,  as  in  the 
matrix  construction  problem,  the  best  way  to  modify  these  numbers  is  to  repeat¬ 
edly  subtract  one  unit  from  the  largest  entry  until  H—L  units  have  been  sub¬ 
tracted. 


89 


References 

[AHU]  Aho,  A.V.  ,  Hopcroft,  J.E.  and  Ullman,  J.D.  ,  ” The  design  and  Analysis  of 
Computer  Algorithms",  Addison-Wesley,  1974. 

[Atal]  Atallah,  M.J.  ,  "Parallel  Strong  Orientation  of  an  Undirected  Graph",  Infor¬ 
mation  Processing  Letters,  18,  pp.  37-39,  1984. 

[BT]  Boesch,  F.  and  Tindell,  R.  ,  "Robbins’s  Theorem  for  Mixed  Graphs",  Amer. 
Math.  Monthly,  87,  pp.  716-719,  1980. 

[BWj  Beineke,  L.W.  and  Wilson,  R.S.  eds.  ,  " Selected  Topics  in  Graph  Theory  . 
Academic  Press,  1978. 

[Berge]  Berge,  C.  ,  " Graphs ",  North  Holland,  1985. 

[CGT]  Chung,  F.R.K.  ,  Garey,  M.R.  and  Tarjan,  R.E  ,  "Strongly  Connected  Orien¬ 
tations  of  Mixed  MultigTaphs",  Networks,  15,  pp.  477-484,  1985. 

[CL]  Chartrand,  G.  and  Lesniak,  L.  ,  "Graphs  &  Digraphs "  2nd  ed.  Wadsworth  & 
Brooks/Cole  ,  1986. 

[CW]  Coppersmith,  D.  and  WTinograd,  S.  ,  "Matrix  Multiplication  via  Arithmetic 
Progression",  Proc.  19th  ACM  Symp.  on  Theory  of  Computing,  pp.  1-6,  1987. 

[Camion]  Camion,  P.  ,  "Chemins  et  Circuits  Hamiltoniens  des  Graphes  Complets", 
C~R  Acad.  Sci.  Paris  (A)  249  pp.  2151-2152  ,  1959. 

[Cole]  Cole,  R.  ,  "Parallel  Merge  Sort",  Proc  27th  IEEE  Symp.  on  Foundations  of 
Comp.  Sci.,  pp.  511-519,  1986. 

[Cookl]  Cook,  S.A.  ,  "Towards  a  Complexity  Theory  of  Synchronous  Parallel  Com¬ 
putation",  L’Ensignment  Mathematique  XXVII,  pp.  99-124,  1981. 

[Cook2]  Cook,  S.A.  ,  "A  Taxonomy  of  Problems  with  Fast  Parallel  Algorithms", 
Information  and  Control,  64,  pp.  2-22,  1985. 

[ET]  Eswaran,  K.P.  and  Taijan,  R.E.  ,  "Augmentation  Problems",  SIAM  J.  on 
Computing,  5,  pp.  653-665,  1976. 

[FF]  Ford,  L.R.  and  Fulkerson,  D.R.  ,  " Flows  in  Networks",  Princeton  University 
Press,  1962. 


90 


[Fich]  Fich,  F.  ,  "New  bounds  for  Parallel  Prefix  Circuits",  Proc.  15th  ACM  Symp. 
on  Theory  of  Computing,  pp.  100-109,  1983. 

[GGKMRS]  Gottlieb,  A.  ,  Grishman,  R.  ,  Kruskal,  C.P.  ,  McAuliffe,  K.M.  , 
Rudolph,  L.  and  Snir,  M.  ,  "The  NYU  Ultracomputer  -  Designing  an  MIMD 
Shared  Memory  Parallel  Computer",  IEEE  Trans.  Comput.  C-32  2  pp.  175- 
189,  1983. 

[GJ]  Garey,  M  R.  and  Johnson,  D.S.  ,  " Computers  and  Intractability",  W.H  Free¬ 
man  and  Company  ,  1979. 

[GP]  Galil,  Z.  and  Pan  V.  ,  "Improved  Processor  Bounds  for  Algebraic  and  Com¬ 
binatorial  Problems  in  RNC",  Proc.  26th  IEEE  Symp.  on  Foundations  of 
Comp.  Sci..  pp.  490-495,  1985. 

[GSS]  Goldschlager,  L.M.  ,  Shaw,  R.A.  and  Staples,  J.  ,  "The  Maximum  Flow 
Problem  is  Logspace  Complete  for  P",  Theoretical  Computer  Science,  21,  pp. 
105-111,  1982. 

[GT]  Goldberg,  A.  and  Tarjan,  R.E.  ,  "A  New  Approach  to  the  Maximum  Flow 
Problem",  Proc.  18th  ACM  Symp.  on  Theory  of  Computing,  pp.  136-146,  1986. 

[Gale]  Gale,  D.  ,  "A  Theorem  on  Flows  in  Networks",  Pacific  J.  Math.,  7  ,  pp. 
1073-1082,  1957. 

[Gusf]  Gusfield,  D.  ,  "Optimal  Mixed  Graph  Augmentation",  Siam  J.  on  Comput¬ 
ing,  16  (4),  pp.  599-612,  1987. 

[Hillis]  Hillis,  W.D.  ,  " The  Connection  Machine"  MIT  Press  ,  1985. 

[IS]  Israeli,  A.  and  Shiloach,  Y.  ,  "An  Improved  Parallel  Algorithm  for  Maximal 
Matching  in  a  Graph",  Information  Processing  Letters,  22,  pp.  57-60,  1986. 

[KU]  Karlin,  A.R.  and  Upfal,  E.  ,  "Parallel  Hashing  -  an  Efficient  Implementation 
of  Shared  Memory",  Proc.  18th  ACM  Symp.  on  Theory  of  Computing,  pp.  160- 
168,  1986. 

[KUW1]  Karp,  R.M.  ,  Upfal,  E.  and  Wigderson,  A.  ,  "Are  Search  and  Decision 
Problems  Computationally  Equivalent?",  Proc.  17th  ACM  Symp.  on  Theory  of 
Computing,  pp.  464-475,  1985. 

[KUW2]  Karp,  R.M.  ,  Upfal,  E.  and  Wigderson,  A.  ,  "Constructing  a  Perfect 
Matching  is  in  Random  NC",  Combinatorica,  6  (1)  ,  pp.  35-48,  1986. 


91 


[Ladner]  Ladner,  R.E.  ,  "The  Circuit  Value  Problem  is  log  Space  Complete  for  P”, 
SIGACT  News,  7,1,  PP-  18-20,  1975. 

[Lawler]  Lawler,  E.L.  ,  " Combinatorial  Optimization,  Networks  and  Matroids ", 
Holt,  Reinhart  and  Winston,  1976. 

[Luby]  Luby,  M.  ,  "A  Simple  Parallel  Algorithm  for  the  Maximal  Independent  Set 
Problem",  Proc.  17th  ACM  Symp.  on  Theory  of  Computing,  pp.  1-10,  1985. 

[MR]  Miller,  G.L.  and  Reif,  J.H.  ,  "Parallel  Tree  Contraction  and  its  Application", 
Proc.  26th  IEEE  Symp.  Foundations  of  Comp.  Sci.,  pp.  47S-439,  1985. 

[MW]  Mulmuley,  K.  ,  Vazirani,  U.V.  and  Vazirani,  V.V.  ,  Matching  is  as  Eas> 
as  Matrix  Inversion",  Proc.  19th  ACM  Symp.  on  Theory  of  Computing, 

[Moon]  Moon,  J.W.  ,  " Topics  on  Tournaments",  Holt,  Reinhart  &  Winston  ,  1968. 

[Naor]  Naor,  J.  ,  "Two  Parallel  Algorithms  in  Graph  Theory”,  Technical  Report 
CS-86-6,  Department  of  Computer  Science,  Hebrew  University,  June  1986. 

[NS]  Nisan,  N.  and  Soroker,  D.  ,  " Parallel  Algorithms  for  Zero-One  Supply- 
Demand  Problems”,  Report  No.  UCB/CSD  87/368,  Computer  Science  Division, 
University  of  California,  Berkeley,  August  1987. 

[PS]  Papadimitriou,  C.H.  and  Steiglitz,  K.  ,  " Combinatorial  Optimization:  Algo¬ 
rithms  and  Complexity”,  Prentice-Hall  ,  1982. 

[Pipp]  Pjppenger,  N.  ,  "On  simultaneous  Resource  Bounds',  Proc.  20th  IEEE 
Symp.  Foundations  of  Comp.  Sci.,  pp.  307-311,  1979. 

[RT]  Rettberg,  R.  and  Thomas,  R.  ,  "Contention  is  no  Obstacle  to  Shared-Memory 
Multiprocessing"  CACM,  29  (12)  pp.  1202-1212,  1986. 

[Redei]  Redei,  L.  ,  "Ein  Kombinatorischer  Satz",  Acta  Litt.  Sci.  Szeged,  7  pp.  39-43 
,  1934. 

[Renade]  Renade,  A.G.  ,  "How  to  Emulate  Shared  Memory",  Proc  28th  IEEE 
Symp.  on  Foundations  of  Comp.  Sci.,  pp.  185-194,  198/. 

[Robbin]  Robbins,  H.  ,  "A  Theorem  on  Graphs  with  an  Application  to  a  Problem  of 
Traffic  Control",  Amer.  Math.  Monthly,  46,  pp.  281-283,  1939. 

[Rober]  Roberts,  F.S.  ,  " Applied  Combinatorics”,  Prentice  Hall  ,  1984. 

[Ryser]  Ryser,  H.J.  ,  "Traces  of  Matrices  of  Zeros  and  Ones",  Canad.  J.  Math.,  9  , 


92 


pp.  463-476  ,  1960. 

[SASLMW]  Schneck,  P.B.  ,  Austin,  D.  ,  Squires,  S.L.  ,  Lehmann,  J.  ,  Mizell,  D 
and  Wallgren,  K.  ,  "Parallel  Processor  Programs  in  the  Federal  Government", 
IEEE  Computer,  18  <61  pp.  43-56  ,  1985. 

[SV]  Shiloach,  Y.  and  Vishkin,  U.  ,  "An  0(  log  n)  Parallel  Connectivity  Algo¬ 
rithm",  J.  of  Algorithms,  3,  pp.  57-67,  1982. 

[Sorol]  Soroker,  D.  ,  "Fast  Parallel  Algorithms  for  Finding  Hamiltonian  Paths 
and  Cycles  in  a  Tournament",  J.  of  Algorithms,  to  appear. 

[Soro2]  Soroker,  D.  ,  "Fast  Parallel  Strong  Orientation  of  Mixed  Graphs  and 
Related  Augmentation  Problems",  J .  of  Algorithms,  to  appear. 

[Soro3]  Soroker,  D.  ,  " Optimal  Parallel  Construction  of  Prescribed  Tournaments" , 
Report  No.  UCB/CSD  87/371,  Computer  Science  Division,  University  of  Cali¬ 
fornia,  Berkeley,  September  1987. 

[TV]  Taijan,  R.E.  and  Vishkin,  U.  ,  "An  Efficient  Parallel  Biconnectivity  Algo¬ 
rithm",  Siam  J.  on  Computing,  14  (4),  pp.  862-874,  1985. 

[Tsin]  Tsin,  Y.H.  ,  "An  Optimal  Parallel  Processor  Bound  in  Strong  Orientation  of 
an  Undirected  Graph",  Information  Processing  Letters,  20,  pp.  143-146,  198o. 

[Upfal]  Upfal,  E.  ,  "A  Probabilistic  Relation  Between  Desirable  and  Feasible 
Models  of  Parallel  Computation",  Proc.  16th  ACM  Symp.  on  Theory  of  Com¬ 
puting,  pp.  258-265,  1984. 

[Vishl]  Vishkin,  U.  ,  " Synchronous  Parallel  Communication  -  a  Survey  ,  TR  71, 
Dept,  of  Computer  Science,  Courant  Institute,  NYU,  1983. 

[Vish2]  Vishkin,  U.  ,  "On  Efficient  Parallel  Strong  Orientation",  Information  Pro¬ 
cessing  Letters,  20,  pp.  235-240,  1985. 


